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Preface 


This book is intended for students, teachers, engineers and scientists wishing 
to become familiar with metaheuristics. Frequently, metaheuristics are seen as 
an iterative master process guiding and modifying the operations of subordinate 
heuristics. As a result, the works in the field are organized into chapters, each 
presenting a metaheuristic, such as simulated annealing, tabu search, artificial ant 
colonies or genetic algorithms, to name only the best known. 

This book addresses metaheuristics from a new angle. It presents them as a set of 
basic principles that are combined with each other to design a heuristic algorithm. 


Heuristics and Metaheuristics 


When addressing a new problem, we try to solve it by exploiting the knowledge 
acquired by experience. If the problem seems peculiarly difficult, a solution that is 
not necessarily the best possible is accepted. The matter is to discover the solution 
with a reasonable computational effort. Such a resolution method is then called a 
heuristic. 

By analysing a whole menagerie of metaheuristics proposed in the literature, we 
identified five major basic principles leading to the design of a new algorithm: 


1. Problem modeling The most delicate phase when confronted with a new 
problem is its modeling. Indeed, if a problem is taken by the “wrong end,” 
its resolution can be largely compromised. Naturally, this phase is not the 
prerogative of metaheuristics. 

2. Decomposition into sub-problem When one has to solve a complex problem 
or an instance of large size, it is necessary to decompose it into simpler or 
smaller sub-problems. These may themselves be difficult. Hence, they must be 
approached by an appropriate technique, for example a metaheuristic. 

3. Building a solution When a suitable model is found, it becomes easy to build 
a solution to the problem, even if it is not good or even inapplicable in practice. 
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One of the most common construction methods is a greedy algorithm, which may 
even provide exact solutions for simple problems such as the minimum spanning 
tree or the shortest path. 

4. Modifying a solution The next step tries to improve a solution by applying slight 
modifications. This approach can be seen as a translation to the discrete world of 
gradient methods for differentiable optimization. 

5. Randomization and learning Finally, the repetition of constructions or mod- 
ifications makes it possible to improve the quality of the solutions produced, 
provided that a random component and/or a learning process are involved. 


Table 1 Context of application of a heuristic method and a metaheuristic framework 
Heuristics Metaheuristic 

Area of application A generic optimization problem | Combinatorial optimization 
Knowledge to include | Specific to the problem Heuristic optimization methods 


Data to provide Numerical values of a problem A generic optimization problem 
instance 


Result A heuristic solution to the A heuristic algorithm 


instance 


Metaheuristics have become an essential tool to tackle difficult optimization 
problems, even if they have sometimes been decried, especially in the 1980s by 
people who opposed exact and heuristic methods. Since then, it has been realized 
that many exact methods embed several heuristic procedures and did guarantee 
optimality only with limited precision! 


Book Structure 


This book is divided into three parts. The first part recalls some basics of linear 
programming, graph theory and complexity theory, and presents some simple and 
intractable combinatorial optimization problems. The aim of this first part is to make 
the field intelligible to a reader with no particular knowledge about combinatorial 
optimization. 

The second part deals with the fundamental building blocks of metaheuristics: 
the construction and improvement of solutions as well as the decomposition of a 
problem into sub-problems. Primitive metaheuristics assembling these ingredients 
and using them in iterative processes running without memory are also incorporated. 

The third part presents more advanced metaheuristics exploiting forms of 
memory and learning that allow the development of more elaborate heuristics. The 
exploitation of a memory can be done in different forms. One can try learning how to 
build good solutions directly on the basis of statistics gathered from previous trials. 
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Another possibility is to try to exploit memories to move intelligently through the 
solution space. Finally, one can store a whole set of solutions and combine them. 

The book concludes with a chapter providing some advice on designing heuris- 
tics, an appendix providing source code for testing various methods discussed in the 
book and solutions to the exercises given at the end of each chapter. 


Chapter 1 “Elements of Graphs and Complexity Theory" Before considering 
developing a heuristic for a problem, it is necessary to ensure the problem is 
difficult and that there is not an efficient algorithm to solve it exactly. This chapter 
includes a very brief introduction to two techniques for modeling optimization 
problems, linear programming and graph theory. Formulating a problem as a 
linear program describes it formally and unambiguously. Once expressed in this 
form and if the problem data is not too large, automatic solvers can be used to 
solve it. Some solvers are indeed built into spreadsheets of Office suites. With a 
little luck, there is no need to design a heuristic! 

Many combinatorial optimization problems can be “drawn” and thus represented 
intuitively and relatively naturally by a graph. This book is illustrated by 
numerous examples of problems from graph theory. The traveling salesman 
problem is most likely the best known and has served as a guideline in the writing 
of this book. 

Some elements of complexity theory are also presented in this introductory 
chapter. This area deals with the classification of problems according to their 
difficulty. Some simple techniques are given to show a problem is intractable. 
This helps to justify why it is essential to turn to the design of heuristics. 

Chapter 2 “A Short List of Combinatorial Optimization Problems” This 

chapter reviews a number of classical problems in combinatorial optimization. It 
illustrates the sometimes narrow boundary between an easy problem, for which 
an efficient algorithm is known, and an intractable problem which differs merely 
in a small detail that may seem trivial at first sight. 
Likewise, it allows the reader who is not familiar with combinatorial optimization 
to discover a broad variety of problems in various domains: optimal paths, travel- 
ing salesman, vehicle routing, assignment, network flow, scheduling, clustering, 
etc. 

Chapter 3 “Problem Modeling" This chapter begins by describing techniques 
for simplifying the treatment of constraints, notably by transforming the objec- 
tive into a fitness function. Then, it gives a brief overview of multi-objective 
optimization. Finally, it provides some practical applications of classical combi- 
natorial optimization problems. It gives examples of data transformations to deal 
with applications that are at first sight far from classical descriptions. 

Chapter 4 “Constructive Methods” This chapter presents methods for con- 
structing solutions, starting with a reminder of the separation and evaluation 
methods, widely used for the design of exact algorithms. Next, two basic methods 
are presented, random construction and greedy construction. The latter sequen- 
tially selects the elements to be added to a partial solution, never questioning the 
choices that have been made. This method can be improved by evaluating more 
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deeply the consequences of choosing an element. The beam search and the pilot 
method are part of these. The construction of a solution constitutes the first step 
in the design of a heuristic. 

Chapter 5 *Local Search" The further step is to improve an existing solution 
by searching for minor changes that improve it. Local searches constitute the 
backbone of most metaheuristics. These methods are based on the definition of a 
set of neighbor solutions, for any solution of the problem. The definition of this 
set naturally depends on the modeling of the problem. Depending on the latter, a 
naturally expressed neighborhood may be too small to lead to quality solutions 
or, on the contrary, too large, leading to prohibitive computational times. Various 
methods have been proposed to enlarge the neighborhood, such as filter and fan 
or ejection chains, or to reduce it, such as granular search or candidate list. 

Chapter 6 *Decomposition Methods" In the process of developing a new algo- 
rithm, this chapter should logically have been placed after the one devoted to 
problem modeling. However, decomposition methods are only used when the 
size of the data to be processed is large. It is, therefore, an optional phase, which 
the reader can ignore before moving on to stochastic and learning methods. This 
is the reason why it is placed at the end of the first part of this book, devoted 
to the key ingredients of metaheuristics. In this chapter, we consider methods 
like POPMUSIC or more general methods such as large neighborhood search or 
fix-and-optimize. 

Chapter 7 “Randomized Methods" This chapter is devoted to methods repeat- 
ing randomly and without memory constructions or modifications of solutions. 
Among the most popular techniques, we find GRASP, which integrates two 
basic bricks of metaheuristics: a randomized greedy construction and a local 
search. Four randomized local searches are presented in this chapter, showing 
that with the same classic recipe, different heuristics can be obtained: simulated 
annealing, threshold accepting, great deluge and the noising methods. The 
variable neighborhood search equally finds its place in this chapter. 

Chapter 8 “Construction Learning" Following the order in which the key 
ingredients of metaheuristics are presented, one can first seek to improve 
the solution building process. Having constructed many solutions, one can 
collect statistics on their structure and exploit this data to construct new 
solutions. Artificial ant colonies represent a typical example. Another technique, 
vocabulary building, is also discussed in this chapter. 

Chapter 9 “Local Search Learning" If local searches constitute the backbone 
of metaheuristics, the taboo search, which seeks to learn how to iteratively 
modify a solution, can be considered as the master of metaheuristics. The 
term “metaheuristic” was coined by its inventor. This chapter will focus on the 
ingredients that can be considered as the basis of taboo search, namely the use of 
memories and solution exploration strategies. Other ingredients of taboo search 
proposed by its inventor, such as candidate lists, ejection chains or vocabulary 
building, find a more logical place in other chapters. 

Chapter 10 “Population Management" When one has a population of solu- 
tions, one can try learning how to combine them and how to manage this 
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population. The most popular method in this field is undoubtedly genetic 
algorithms. However, genetic algorithms are a less advanced metaheuristic than 
scatter search, which provides strategies for managing a population of solutions. 
GRASP method with path relinking shows how to design a simple heuristic 
integrating several basic bricks of metaheuristics, ranging from randomized con- 
struction to population exploitation through local searches. Ultimately, among 
the latest metaheuristics, we find particle swarm methods, which seem to be 
adapted to continuous optimization. It should be noted that the spreadsheets of 
Office suites directly integrate this type of heuristic methods among the proposed 
solvers. 

Chapter 11 “Heuristics Design” The concluding chapter of the book dispenses 
some advice on designing heuristics. It returns to the difficulty that can be 
encountered when modeling the problem and gives an example of decomposition 
into a chain of sub-problems for easy handling. Next, it proposes an approach for 
the development of a heuristic. Finally, some techniques for the parameter tuning 
and comparing the efficiency of algorithms are reviewed. 


Source Codes for the Traveling Salesman Problem 


One of the reasons for the popularity of metaheuristics is that they allow addressing 
difficult problems with simple codes. This book contains several pieces of code 
illustrating how to implement the basic methods discussed. Due to a certain level 
of abstraction, these principles could be perceived as a sculpture on clouds. The 
codes eliminate all ambiguity on the inevitable interpretations that can be done when 
presenting a metaheuristic framework. The computer scientist wishing to develop a 
heuristic method for a particular problem can be inspired by these codes. 

As a source code is useful only when one wants to know all the details of a 
method, but that it is of little interest when reading. So, we have simplified and 
shortened the codes, trying to limit them to a single page. These codes come 
in addition to the text and not the opposite. The reader with little interest in 
programming or not familiar with the programming language used can skip them. 

These codes have been tested and are relatively efficient. They addressed the 
emblematic traveling salesman problem. The latter is pedagogically interesting 
because its solution can be graphically drawn. Certainly, these codes are not “horse 
race,” but they contain the quintessence of methods, and their extreme brevity should 
allow the reader to understand them. More than a dozen different methods have been 
implemented, while jointly taking less than one-tenth of the number of lines of code 
of one of the fastest implementations. However, we had to comply with this brevity. 
The codes are somewhat compact, succinctly commented and sometimes placing 
several instructions on the same line. So, we ask the reader used to sparser codes to 
be indulgent. 
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Exercises 


Many exercises have been imagined. Their choice has always been guided by 
the solutions expected, which must be as unambiguous as possible. Indeed, when 
designing a new heuristic, there is no more either a correct or a false solution. There 
are only heuristics that work well for some problem instances and others which do 
not produce satisfactory results—bad quality of solutions, prohibitive calculation 
time, etc. 

Nothing is more destabilizing for a student than ambiguous or even philosophical 
responses. This reason has led us to provide the solutions to all the exercises. 
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Part I 
Combinatorial Optimization, Complexity 
Theory, and Problem Modeling 


This part is a gentle introduction to some basics of linear programming, graph 
theory, and complexity theory and presents some simple and difficult combinatorial 
optimization problems. The purpose of this introductory part is to make the domain 
intelligible to a reader who does not have specific knowledge in modeling such 
problems. 


Chapter 1 A 
Elements of Graphs and Complexity PEE 
Theory 


Before designing a heuristic method to find good solutions to a problem, it is 
necessary to be able to formalize it mathematically and to check that it belongs 
to a difficult class. Thus, this chapter recalls some elements and definitions in graph 
theory and complexity theory in order to make the book self-contained. On the one 
hand, basic algorithmic courses very often include graph algorithms. Some of these 
algorithms have simply been transposed to solve difficult optimization problems in 
a heuristic way. On the other hand, it is important to be able to determine whether a 
problem falls into the category of difficult problems. Indeed, one will not develop a 
heuristic algorithm if there is an efficient algorithm to find an exact solution. 


1.1 Combinatorial Optimization 


The typical field of application of metaheuristics is combinatorial optimization. Let 
us briefly introduce this domain with an example of a combinatorial problem: the 
coloring of a geography map. It is desired to assign a color for each country drawn 
on a map so that any two countries that have a common border do not receive the 
same color. In Fig. 1.1, five different colors are used, without worrying about the 
political attribution of the islands or enclaves. 

This is a combinatorial problem. Indeed, if there are n areas to color with five 
colors, there are 5” different ways to color the map. Most of these colorings are 
unfeasible because they do not respect the constraint that two areas with a common 
border do not receive an identical color. The question could be asked whether there 
is a feasible coloring using only four colors. More generally, one may want to find 
a coloring using a minimum number of colors. Consequently, we are dealing here 
with a combinatorial optimization problem. 
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Fig. 1.1 An old European map colored with five colors (taking the background into account) 


How to model this problem more formally? Let us take a smaller example (see 
Fig. 1.2): can we color Switzerland (s) and neighboring countries (Germany d, 
France f, Italy, Liechtenstein, and Austria a) with three colors? 

A first model can be written using 18 binary variables that are put into equations 
or inequations. Let us introduce variables x51, x52, X53, Xq1. Xd2, - -. , Xa3 that should 
either take value | or 0. xj; = 1 means that country i receives color k. Now, we can 
impose that a given country i receives exactly one color by writing the equation: 
Xil + xi2 + xi3 = 1. To avoid assigning the same color to two countries (i and j) 
having a common border, we can write three inequalities (one for each color): 

Xii xji & l, xi2 + xj? < Land xi3 + xj3 <1. 

Another model can introduce 18 Boolean variables bs1, bs2, bs3, bai, bao, ..., 
ba3 that indicate the color (1, 2 or 3) of each country. bj, = true means that country 
i receives color k. Now, we write a long Boolean formula that is true if and only if 
there is a feasible 3-coloring. First of all, we can impose that Switzerland is colored 
with at least one color: bs; Vbs2Vbs3. But it should not receive both color 1 and color 
2 at the same time: This can be written bs; ^ bs2, which is equivalent to by V bs2. 
Then, it should also not receive both color 1 and color 3 or color 2 and color 3. Thus, 
to impose that Switzerland is colored with exactly 1 color, we have the conjunction 
of four clauses: 


(bsi V bs2 V bs3) ^ (Bs V bs2) ^ bsi V bs3) ^ (Bs2 V bs3) 


For each of the countries concerned, it is also necessary to write a conjunction of 
four clauses but with the variables corresponding to the other countries. Finally, for 
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each border, it is necessary to impose that the colors on both sides are different. For 
example, for the border between Switzerland and France, we must have: 


Dsi V by) A (Bs2 V br) ^ (bs3 V b ga) 


Now a question arises: how many variables are needed to color a map with n 
countries which have a total of m common borders using k colors? Another one is: 
how many constraints (equation, inequation, or clauses) are needed to describe the 
problem? First, it is necessary to introduce n-k variables. Then, for each country, we 
can write one equation or | + EED clauses to be sure that each country receives 
exactly one color. Finally, for each border, it is necessary to write one inequation 
or m - k clauses. The problem of coloring such a map with k colors has a solution 
if and only if there is a value 1 or O for each of the binary variables or a value 
true or false for each of the Boolean variables such that all the constraints are 
simultaneously satisfied. 

The Boolean model is called the satisfiability problem (SAT). It plays a central 
role in complexity theory. This extensive development is to formalize the problem 
by a set of equations or inequations or by a unique, long Boolean formula, but does 
not inform us yet how to discover a solution! 

An extremely primitive algorithm to find a solution is to examine all the possible 
values for the variables (there are 2"* different sets of values), and for each set, we 
have to check if the formula is true. 

As modeled above, coloring a map is a decision problem. Its solution is either 
true (a feasible coloring with k colors exists) or false (this is impossible). 
Assuming that an algorithm æ is available to obtain the values to assign to the 
variables so that all equations or inequations are satisfied or the Boolean formula is 
true—or to say that such values do not exist—is it possible to solve the optimization 
problem: which is the minimum number k of colors for having a feasible coloring? 

One way to answer this question is to note that we need at most n colors for n 
areas and to assign a distinct color to each of them. As a result, we know that an 
n-coloring exists. We can apply the algorithm .& to ask for an n — 1 coloring, then 
n — 2, etc. until getting the answer that no coloring exists. The ultimate value for 
which the algorithm has found a solution corresponds to an optimal coloring. 

A faster technique is to proceed by a dichotomy: rather than reducing the number 
of color by one unit at each call of algorithm «7, two values, kmin and kmax, are 
stored so that it is known that there is no feasible coloring (respectively, a feasible 
coloring exists). By eliminating the case of the trivial map that has no boundary, we 
know that we can start with kmin = 1 and kmax = n. The algorithm is asked for 


ak= | nin EE. | coloring. If the answer is yes, we modify kmax «— | kmintkmax | 


2 2 
if the answer is no, we change kmin «— | EninT kmax |. The method is repeated until 


Kmax = Kmin + 1. This value corresponds to the optimum number of colors. So, an 
optimization problem can be solved with an algorithm answering the corresponding 
decision problem. 
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1.1.1 Linear Programming 


Linear programming is an extremely useful tool for mathematically modeling 
many optimization problems. Mathematical programming is the selection of a best 
element, with regard to some quantitative criterion, from some set of feasible 
alternatives. When the expression of this criterion is a linear function and all feasible 
alternatives can be described by means of linear functions, we are talking about 
linear programming. 

A linear program under canonical form can be mathematically written as 
follows: 


Maximize z= cyxy+ Coxat oe et Cn Xp (1.1) 
Subject ayx + djoXod- c AynXn Sbi (1.2) 
to: aznxı+ anxz+ +  AmXn € b 
amıXı1+ am2X2+ ++ AmnXn € bm 
xj 20(j =1,...,n) 
(1.3) 


z represents the objective function and x ; the decision variables. For a production 
problem, the c; can be seen as revenues, the b; being quantities of raw material 
available and the aj; the unit consumption of material i for the production of good 
j- 

The canonical form of linear programming is not limiting, in the sense that 
any linear program can be expressed under this form. Indeed, if the objective is 
to minimize z, this is equivalent to maximizing —z; if a variable x can be either 
positive or negative or null, it can be substituted by x" — x’, where x" and x’ must be 
nonnegative; finally, if we have an equality constraint a;1x1 + aj2X2 +- -+ ajgXg = 
bj, it can be replaced by the constraints aj1x1 + aj2x2 + <-> + GinX, < bi and 

ai1X1 — dj2X2 — +++ — GinXn S —bi. 

The map coloring problem can be modeled by a slightly special linear program. 
For that, one introduces the variables y; that indicate if the color k is used (y; = 1) 
or not (yy, = 0, k = 1,...,n) in addition to the variables xj; that indicate if the area i 
receives the color k. The integer linear program allows formalizing the optimization 
version of the map coloring problem: 


n 


Minimize z = 9 yz (1.4) 
k=1 


Subject to: 
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S od i —], n (1.5) 
k=1 
Xi —yk [S0 i,k=1,...,n (1.6) 
Xik d xjk S1 VG, j) having a common border, (1.7) 
k2l.g,m 
Xik, Yk € 10, 1} (1.8) 


The objective (1.4) is to use the minimum number of colors. The first set of con- 
straints (1.5) imposes that each vertex receives exactly one color; the second set (1.6) 
ensures that a vertex is not assigned to an unused color; the set (1.7) prevents the 
same color to be assigned to contiguous areas. The integrity constraints (1.8) can 
also be written with linear inequalities (yy 2 0, yk <1, yy EZ). 

Linear programming is a very powerful tool for modeling and formalizing 
problems. If there are no integrity constraints, problems with thousands of variables 
and thousands of constraints can be effectively solved. In this case, the resolution is 
barely more complex than the resolution of a system of linear equations. The key 
limitation is essentially due to the memory space required for data storage as well 
as any numerical problems that may occur if the data is poorly conditioned. 

However, integer linear programs, like the coloring problem expressed above, are 
generally difficult to solve, and specific techniques should be designed. Metaheuris- 
tics are among these techniques. 

If the formulation of a problem under the form of a linear program allows a 
rigorous modeling, it does not help our mind much for its solving. Indeed, the sight 
is the most important of our senses. The adage says a small drawing is better than a 
long speech. The graphs represent a more appropriate way for our spirit to perceive 
a problem. Before presenting other models for the coloring problem (see Sect. 2.8), 
some definitions in graph theory are recalled so that this book is self-contained. 


1.1.2 A Small Glossary on Graphs and Networks 


Graphs are a very useful tool for problem modeling when there are elements that 
have relationships between them. The elements are represented by a point and 
two related elements are connected by a segment. Thus, the previously seen map 
coloring problem can be drawn by a small graph, as shown in the Fig. 1.2. 


1.1.2.1 Undirected Graph, Vertex, (Undirected) Edge 


An undirected graph G is a pair of a set V of elements called vertices or nodes and 
of a set E of undirected edges, each of them associated with a (unordered) pair of 
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Fig. 1.2. Switzerland and its neighbor countries that we want to color. Each country is symbolized 
by a disk, and a common border is symbolized by a line connecting the corresponding countries. 
The map coloring can be transformed into the coloring of the vertices of a graph 


nodes, which are their endpoints. Such a graph is noted as G — (V, E). A vertex of 
a graph is represented by a point or a circle. An edge is represented by a line. 

If two vertices v and w are joined by an edge, they are adjacent. The edge is 
incident with v and w. 

When several edges connect the same pair of vertices, we have multiple edges. 
When both endpoints of an edge are the same vertex, this is a loop. 

When V = Ø (and E = Ø), we have the null graph. When V Z Ø and E = Ø, 
we have an empty graph. A graph with no loop and no multiple edges is a simple 
graph; otherwise, this is a multigraph. Figure 1.2 depicts a simple graph. 

The complement graph G of a simple graph G has the same set of vertices and 
two distinct vertices of G are adjacent if and only if they are not adjacent in G. 


1.1.2.2 Directed Graph, Arcs 


In some cases, the relationships between the pairs of elements are ordered. This is 
a directed graph or digraph. The edges of a digraph are called the arcs or directed 
edges. An arc is represented by an arrow connecting its endpoints. 

It is therefore necessary to distinguish both endpoints of an arc (i, j). The starting 
point į is called the tail and the arrival point j is the head. j is a direct successor of 
i and i is a direct predecessor of j. The set of direct successors of a node i is written 
Succ(i), and the set of its direct predecessors Pred (i). 

An arc whose tail and head are the same vertex is also called a loop, as for the 
undirected case. Two arcs having the same tail and the same head are parallel or 
multiple arcs. 
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1.1.2.3 Incidence Matrix 


The incidence matrix A of a graph with n vertices and m arcs and without loops 
is a matrix with m columns and n rows. The coefficients ajj(i = 1,...,n,j = 
1,...,m) of A are defined as follows: 


—1 if i is the tail of the arc (i, j) 
dij — 1 if j is the head of the arc (i, j) 
0 else 


In the case of an undirected graph, both endpoints are represented by 1s in the 
vertex-edge incidence matrix. It should be noticed that the incidence matrix does 
not allow to properly represent loops. 


1.1.2.4 Adjacency Matrix 


The adjacency matrix of a simple undirected graph is a square matrix with the 
coefficient aj; is 1 if vertices i and j are adjacent and 0 otherwise. 


1.1.2.5 Degree 


The degree of a vertex v of an undirected graph, noted deg(v), is the number of 
edges that are incident to v. A loop increases by 2 the degree of a vertex. A vertex 
of degree 1 is pendent. A graph is regular if all its vertices have the same degree. 
For a directed graph, the outdegree of a vertex, noted deg? (v), is the number of arcs 
having v as tail. The indegree of a vertex, deg” (v), is the number of arcs having v 
as head. 


1.1.2.6 Path, Simple Path, Elementary Path, and Cycle 


A path (also referred to as a walk) is an alternating sequence of vertices and 
edges, beginning and ending with a vertex, such that each edge is surrounded by 
its endpoints. A simple path (also referred to as a trail) is a walk for which all edges 
are distinct. An elementary path (also simply referred to as a path) is a trail in which 
all vertices (and therefore also all edges) are distinct. A cycle is a trail where the 
first vertex is corresponding to the last vertex. A simple cycle is a cycle in which 
the only repeated vertex is the first/last one. The length of a walk is its number of 
edges. Contrary to French, there is no difference in the wording between undirected 
and directed graphs. So, the edges, paths, etc. must be qualified with "directed" or 
“undirected.” However, arcs are always directed edges. 
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1.1.2.7 Connected Graph 


An undirected graph is connected if there is a path between every pair of its vertices. 
A connected component of a graph is a maximal subset of its vertices (and incident 
edges) such that there is a path between every pair of the vertices. A directed graph 
is strongly connected if there is a directed path in both directions between any pair 
of vertices. 


1.1.2.8 Tree, Subgraph, and Line Graph 


A tree is a connected graph without cycles (acyclic). A leaf is a pendent vertex of a 
tree. A forest is a graph without cycles. Each of its connected component is a tree. 
A rooted tree is a directed graph with a unique path from one vertex (the root of the 
tree) to each remaining vertex. 

G' = (V', E’) is a subgraph of G = (V, E), if V' C V and E' has all the edges 
of E with both endpoints in V'. A spanning tree of a graph G is a subgraph of G 
which is a tree. 

The line graph L(G) of a graph G is built as follows (see also Fig. 2.12): 


* Each edge of G is associated with a vertex of L(G). 
* Two vertices of L(G) are joined by an edge if their corresponding edges in G 
share an endpoint. 


1.1.2.9 Eulerian, Hamiltonian Graph 


A graph is Eulerian if it contains a walk that uses every edge exactly once. A graph 
is Hamiltonian if it contains a walk that uses every vertex exactly once. Sometimes, 
Eulerian and Hamiltonian graphs are limited to the case when there is a cycle that 
uses every edge or every vertex exactly once (the first/last vertex excepted). 


1.1.2.10 Complete, Bipartite Graphs, Clique, and Stable Set 


In a complete graph, every two vertices are adjacent. All edges that could exist are 
present. A bipartite graph G = (V, E) is such that V = Vj U Vo, Vi (1 V; = Ø and 
each edge of E has one endpoint in V; and the other in V2. A clique is a maximal 
set of mutually adjacent vertices that induces a complete subgraph. A stable set or 
independent set is a subset of vertices that induces a subgraph without any edges. A 
number of elements defined in the above paragraphs are illustrated in Fig. 1.3 
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Fig. 1.3 Basic definition of graph components 


1.1.2.11 Graph Coloring and Matching 


The vertex coloring problem has been used as an introductory example in the 
Sect. 1.1 devoted to combinatorial optimization. A proper coloring is a labeling of 
the vertices of a graph by elements from a given set of colors such that distinct 
colors are assigned to the endpoints of each edge. The chromatic index of a graph 
G, noted x (G), represents the minimum number of colors of a proper coloring of 
G. An edge coloring is a labeling of the edges by elements from a set of colors. 
The proper edge coloring problem is to minimize the number of colors required so 
that two incident edges do not receive the same color. A matching is a set of edges 
sharing no common endpoints. A perfect matching is a matching that matches every 
vertex of the graph. 
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1.1.2.12 Network 


In many situations, a weight w(e) is associated with every edge e of a graph. 
Typically, w(e) is a distance, a capacity or a cost. A network, noted R = (V, E, w), 
is a graph together with a function w : E — R. The length of a path in a network is 
the sum of the weights of its edges. 


1.1.2.13 Flow 


A classical problem in a directed network R — (V, E, w) is to assign a nonnegative 
flow xij to each edge e = (i, j) so that »/;esiccq) Xij = LikePred(iytki Vi € 
V.i Æ s,t. Vertex s is the source-node and ¢ the sink-node. If 0. < Xij S 
wijY(i, j) € E, the flow from s to t is feasible. 

A cut is a partition of the vertices of a network R = (V, E, w) into two subsets 
A C V and A C V. The capacity of a cut from A to A is the sum of the weight of 
the edges that have one endpoint in A and the other in A. 

Network flows are convenient to model problems that have, at first glance, noth- 
ing in common with flows, like resource allocation problems (see, e.g., Sect. 2.5.1). 
Further, in this chapter, we will review some well-known and effective algorithms 
for the minimum spanning tree, the shortest path, or the optimum flow in a network. 
Other problems, like graph coloring, are intractable. The only algorithms known to 
solve them require a time that can grow exponentially with the size of the graph. 

Complexity theory focuses on classifying computational problems into easy 
and intractable ones. Metaheuristics have been designed to identify satisfactory 
solutions to difficult problems, while requiring a limited computing effort. Before 
developing a new algorithm on the basis of the principles of metaheuristics, it is 
essential to be sure the problem addressed is an intractable one and that there 
are not already effective algorithms to solve it. The rest of this chapter exposes 
some theoretical bases in the field of classification of problems according to their 
difficulty. 


1.3 Elements of Complexity Theory 


The purpose of complexity theory is to classify the problems in order to predict 
whether they will be easy to solve. To limit ourselves to sequential algorithms, we 
consider, very roughly, that an easy problem can be solved by an algorithm, which 
computational effort is limited by a function that polynomially depends on the size 
of the data to be treated. We can immediately dare why the difficulty limit must 
be on the class of polynomials and not on that of logarithmic, trigonometric, or 
exponential functions. 

The reason is very simple: we can perfectly conceive that more effort is 
required to process a larger volume of data, eliminating nongrowing functions 
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like trigonometric ones. Limited to sequential methods, it is clear that each record 
must be read at least once, which implies a growth in the number of operations at 
least linear. This eliminates logarithmic, square root, etc. functions. Naturally, for 
a parallel treatment of the data by several tasks, it is quite reasonable to define a 
class of problems (very easy), requesting a number of operations and memory per 
processor increasing at most logarithmically with the data volume. An example of 
such a problem is finding the largest number of a set. 

Finally, we must consider that an exponential function (in the mathematical 
sense, such as 2", but also extensions such as xl?8*, x! or x* ) always grow faster 
than any polynomial. This growth is incredibly impressive. 

Let us examine the example of an algorithm that requires 35? operations for a 
problem with 50 elements. If this algorithm is run on a machine able to perform 10? 
operations per second, the machine will not complete its work before 23 million 
years. By comparison, solving a problem with ten elements—five times smaller— 
with the same algorithm would take only 60 microseconds. 

Hence, it would not be reasonable in practice to consider as easy a problem 
requiring an exponential number of operations to be solved. But combinatorial 
problems include an exponential number of solutions. As a result, complete enumer- 
ation algorithms, sometimes called “brute force," cannot be reasonably considered 
acceptable. Thus, the computation of a shortest path between two vertices of a 
network cannot be solved by enumerating the complete set of all paths since it is 
exponentially large. Algorithms using mathematical properties of the shortest walks 
must be used. These algorithms perform a number of steps that is polynomial in the 
network size. On the one hand, finding a shortest walk is an easy problem. On the 
other hand, finding a longest (or a shortest) path (without circuits or without visiting 
twice the same vertex) between two vertices is an intractable problem, because no 
polynomial algorithm is known to solve it. 

Finally, we must mention that the class of polynomials has an interesting 
property: it is closed. The composition of two polynomials is also a polynomial. 
In the context of programming, it means that a polynomial number of calls to a 
subroutine that requires a computational effort that grows polynomially with the 
data size leads to a polynomial algorithm. 


1.2.1 Algorithmic Complexity 


Complexity theory and algorithmic complexity should not be mixed up. As already 
mentioned, complexity theory focuses on the problem classification. The purpose 
of algorithmic complexity is to evaluate the resources required to run a given 
algorithm. It is therefore possible to develop an algorithm of high complexity for 
a problem belonging to the class of “simple” problems. 

To be able to put a problem into a complexity class, we will not assume the use of 
any given algorithm to solve this problem, but we will analyze the performance of 
the best possible algorithm—not necessarily known—for this problem and running 
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on a given type of machine. We must not confuse the simplicity of an algorithm 
(expressed, e.g., by the number of lines of code needed to implement it) and its 
complexity. Indeed, a naive algorithm can be of high algorithmic complexity. 

For instance, to test if an integer p is prime, we can try to divide it by all the 
integers between 2 and ,/p. If all these divisions have a reminder, we can conclude 
that p is prime. Otherwise, there is a certificate (a divider of p) proving that p is 
not prime. This algorithm is easy to implement. However, it is not polynomial in the 
size of the data. Indeed, just n = log) (p) bits are required to code the number p. 
Therefore, the algorithm requires a number of divisions proportional to 2", which 
is not polynomial. 

However, it has been proven in 2002 that there is a polynomial algorithm to 
detect if a number p is prime. As we can expect, this algorithm is undoubtedly 
a sophisticated one. Its analysis and implementation is just a task at the limits 
of human capacities. So, testing whether a number is prime or not remains a 
simple problem (because there is a polynomial algorithm to solve it). However, this 
algorithm is difficult to implement and would require a prohibitive computational 
time to prove that 252.589.933 _ | is prime. Conversely, there are algorithms that could 
theoretically degenerate but that consistently behave appropriately in practice, like 
the simplex algorithm for linear programming. 

The resources required during the execution of an algorithm are limited. They 
are of several types: number of processors, memory space, and time. Looking at this 
last resource, we could measure the effectiveness of an algorithm by evaluating 
its running time on a given machine. Unluckily, this measure presents many 
weaknesses. First, it is relative to a particular machine, whose lifetime is limited 
to a few years. Then, the way the algorithm has been implemented (programming 
language, compiler, options, operating system) can notably influence its running 
time. Therefore, it is preferred to measure the characteristic number of operations 
that an algorithm will perform. Indeed, this number does not depend on the machine 
or language and can be perfectly theoretically evaluated. 

We call complexity of an algorithm a function f (n) that gives the characteristic 
number of steps executed in the worst case, when it runs on a problem whose data 
size is n. It should be mentioned that this complexity has nothing to do with the 
length of the code or with the difficulty to code it. The average number of steps 
is also seldom used since this number is generally difficult to evaluate. Indeed, it 
would be necessary to take an average for all possible data sets. In addition, the 
worst-case evaluation is essential for applications where the running time is critical. 


1.2.2 Bachmann-Landau Notation 


In practice, a rough overestimate is used to evaluate the number of steps performed 
by an algorithm to solve a problem of size n. Suppose that two algorithms, <4 
and . perform, respectively, for the same problem of size n, f(n) = 10n? and 
g(n) = 0.2- n? operations. 
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Fig. 1.4 Observed computational time for building a traveling salesman tour as a function of the 
number n of cities. For instances with more than a million cities, the time remains below the 
c-nlogn function. This verifies that the method is in O(n logn) 


On the one hand, for n = 10, it is clear that 4 performs five times more 
operations than .%. On the other hand, as soon as n > 50, æ% will perform more 
steps than .e/1. 


As n grows large, the n? term will come to dominate. The positive coefficients in 
front of n? and n? in f (n) and g(n) become irrelevant. The function g (1) will exceed 
f (n) once n grows larger than a given value. The order of a function captures the 
asymptotic growth of a function. 


1.2.2.4 Definitions 


If f and g are two real functions of a real (or integer) variable n, it is said that f is 
of an order lower or equal to g if there are two positive constants ng and c such that 
Vn > no, f (n) < c- g(n). This means that g(n) grows larger than f(n) as soon as 
n > ng, irrespective of the constant factor c. With Bachmann-Landau notation, this 
is written f (n) = O(g(n)) or f (n) € O(g(n)). This is the big O notation. 

The diagram in Fig. 1.4 illustrates the usefulness of this notation. It gives the 
observed computation time to construct a traveling salesman's tour for various 
problem sizes. Observing the measurement dispersion for small sizes, it seems 
difficult to find a function for expressing the exact computational time. However, 
the observations for large sizes show the n log n behavior of this method, presented 
in Sect. 6.3.2. 
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The practical interest of this notation is that it is often easy to find a function g 
that increases asymptotically faster than the exact function f which may be difficult 
to evaluate. So, if the number of steps of an algorithm is smaller than g(n) for large 
values of n, it is said that the algorithm runs at worst in O (g(n)). 

Sometimes, we are not interested in the worst case but in the best case. It is said 
that f (n) € «2(g(n)) if f (n) increases asymptotically faster than g(n). 

Mathematically, f (n) € 2 (e(n)) if Vn 2 no, f (n) 2 c- g(x). This is equivalent 
to g(n) = O(f(n)). This notation is useful to show that an algorithm .c/ is less 
efficient than another &: at best, the last performs at least as many steps than æ. 
It can also be used to show that an algorithm @ is optimal: at worst, @ performs 
a number of steps that is not larger than the minimum number of steps required by 
any algorithm to solve the problem. 

If the best and the worst case are the same, i.e., if dco. > c, > O such that 
c1: g(n) € f(n) € c- g(n), then it is written f(n) € O(g(n)). 

The ©(-) notation should be distinguished from a notion (often not well-defined) 
of an average complexity. Indeed, taking the example of the Quicksort algorithm to 
sort n elements, we can say it is in 2 (n) and in O (n?). But this algorithm is not in 
O (n log n), even if its average computational time is proportional to n log n. 

Indeed, it can be proven that the mathematical expectation of the computational 
time of Quicksort for a set of n elements randomly mixed up is proportional 
to nlogn. The notations O(-) (theoretical expected value) and Ó() (empirical 
average) are used later in this book. However, they are not frequently used in the 
literature. To use them properly, we must specify which data set is considered and 
the probability of occurrence of each problem instance, etc. 

In mathematics and more seldom in computer sciences, there also exist the small 
o notations: 

* f(n) € 0(g(n)) if lim, +00 $ > 0 
* f(n) € o(g(n)) if lim, o; LY > 0 


g(n) 


: N " fo _ 
fn) ~ g(n) if timo £0 —1 


There are many advantages to express the algorithmic complexity of an algorithm 
with the big O notation: 


e f(n) € O(g(n)) means that g(n) is larger than the true complexity; this often 
allows to find a function g(n) with an easy calculus while finding f(n) € 
O (g(n)) would have been much more difficult. 

* 25n? = O(8r?) and 3n? = O(25n?), this means that two functions that differ 
solely from a constant factor have the same order; this allows to ignore the 
relative speed of computers; instead of writing O (25n)), we can write O(n?) 
which is equivalent and simpler. 

e 3n°+55n=O (n3), this means that the lower order terms can be neglected; only 
the larger power has to be kept. 


It is important to stress that the complexity of an algorithm is a theoretical 
concept, which is derived by reflection and calculations. This can be established 
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Fig. 1.5 Illustration of the growth of some functions frequently used to express the complexity of 
an algorithm. The horizontal axis indicates the size of the problem (with exponential growth) and 
the vertical axis gives the order of magnitude of the computation time (with iterated-exponential 
growth, from a nanosecond to the expected life of our universe) 


with a sheet and a pencil. The complexity is typically expressed by the order of the 
computational time (or an abstract number of steps performed by a virtual processor) 
depending on the size of the problem. 

Functions commonly encountered in algorithmic complexity are given below, 
with the slower-growing functions listed first. Figure 1.5 depicts the growth of some 
of these functions. 


e O(1): constant. 

e O(logn): logarithmic; the base is not provided since O (log, n) = O(log, n). 
* O(n‘): fractional power, with 0 < c < 1. 

e O(n): linear. 

e O(nlogn): linearithmic. 

e. O (12): quadratic. 

* O(n)): cubic. 

* O(n‘): polynomial, with c > 1 constant. 

e O(n'°8"): quasi-polynomial, super-polynomial, sub-exponential. 
e O(c"): exponential, with c > 1 constant. 

e O(n?): factorial. 
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1.2.3 Basic Complexity Classes 


Complexity theory has evolved considerably since the beginning of the 1970s, when 
Cook showed there is a problem which, if we were able to solve it in polynomial 
time, then it would allow us to solve many others efficiently, like the traveling 
salesman, the integer linear programming, the graph coloring, etc. [1]. 

To achieve this result, it was necessary to formulate a generic problem in 
mathematical terms, how a computer works, and how computational time can 
be measured. To simplify this theory as much as possible, the type of problems 
considered is limited to decision problems. 

A decision problem is formalized by a generic problem and a question; the 
answer should be either “yes” or “no.” 


Example of a Generic Problem 


Let C = (ci, ..., Cn} bea set of n cities, integer distances dj; between the cities cj 
and c; (i, j = 1,...,n), and B an integer bound. 
Question 


Is there a tour of length not higher than B visiting every city of C? Put differently, 
we look for a permutation p of the elements 1,2,...,5 such that dp, p, + 
Yi dpi, dp;,, € B. 

This is the decision version of the traveling salesman problem (TSP for short). 
The optimization version of the problem seeks to find the shortest possible route 
that visits each city exactly once and returns to the origin city. This is undoubtedly 
the best-known combinatorial optimization problem that is intractable. 


1.2.3.4 Encoding Scheme, Language, and Turing Machine 


A problem instance can be represented as a text file. We must subsequently use 
given conventions, for example, put on the first line n, the number of cities, then 
B, the bound, on the second line, and each of the following line will contain three 
numbers, interpreted as i, j and dij. Put differently, an encoding scheme is used. 

We can adopt the formal grammar of language theory, which is similar to those 
used in compiling techniques. Let X be a finite set of symbols or an alphabet. We 
write X* the set of all strings that can be built with the alphabet X. An encoding 
scheme e for a generic problem z allows describing any instance / of z by a string 
x € X*. For the TSP, J contains n, B and all the dj; values. 
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An encoding scheme e for generic problem z partitions the strings of X* into 
three classes: 


1. The strings that do not encode a problem instance J of z 
2. The strings encoding a problem instance J of x for which the answer is “no” 
3. The strings encoding a problem instance J of x for which the answer is “yes” 


This last class is called the language associated with z and e, denoted L(x, e). 

In theoretical computer science, or more precisely in automata theory, the 
computing power of various machine models is studied. Among the simplest 
automata, there are finite-state automata. They are utilized to design or analyze a 
communication protocol for instance. Their states are represented by the vertices of 
a graph and transitions, represented by arcs. Providing an input string, the automaton 
changes from one state to another according to the symbol of the string being read 
and associated transitions rules. Since an automaton maintains a finite number of 
states, this machine possesses a bounded memory. 

A slightly more complex model is a push-down automaton, functioning similarly 
to a finite-state machine, but has a stack. At each step, a symbol of the string is 
interpreted, as well as the symbol at the top of the stack (if the last is not empty). 
The automaton changes its state and places a new symbol at the top of the stack. 
This type of automaton is able to make more complex computations. For instance, 
it can recognize the strings of a non-contextual language. Hence, it can perform 
the syntax analysis of a program described by a grammar of type 2. An even more 
powerful computer model than a stack automaton is the Turing machine. 


Deterministic Turing Machine 


To mathematically represent how a computer works, Alan Turing imagined a fictive 
machine (there were no computers in 1936) whose operations can be modeled by 
a transition function. This machine is able to implement all the usual algorithms. It 
is able to recognize a string generated by a general grammar of type 0 in a finite 
time. Figure 1.6 illustrates such a machine, composed of a program that controls the 
scrolling of a magnetic tape and a read/write head. 

A program for a deterministic Turing machine is specified by: 


1. A tape alphabet /"—the set of symbols that can be written on the tape. I” contains 
at least X, the set of symbols that encodes a decision problem instance, the 
special blank symbol b not belonging to X and eventually other control symbols. 

2. A set of states Q, containing at least qo, the initial state, gy, the final state 
indicating that the answer to the instance is "yes" and qy, the final state 
indicating that the answer is “no.” 

3. A transition function ô : QV(qy, qu] x II! ^ QxT x {-1, 1}. 


This function represents the actions to be performed by the machine when it is in 
a certain state and reads a certain symbol. A Turing machine works as follows: its 
initial state is qo, the read/write head is positioned on cell 1; the tape contains the 
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Fig. 1.6 Schematic representation of a deterministic Turing machine, which allows modeling and 
formalizing a computer 


string x € X* in cells 1 through |x| and b for all other cells. Let q be the current 
state of the machine, o the symbol read from the tape and (g', o^, A) = ó(o, q). 
One step of the machine consists in: 


* Replacing o by o’ in the current cell 
e Moving the head one cell to the left if A = —1 or one cell to the right if A = 1 
* Changing the internal state to q’ 


The machine stops either in state gy or in state qw. This is the reason why the 
transition function ô is only defined for nonfinal states of the machine. 

Although very simple, a Turing machine can conceptually represent everything 
that happens in a common computer. This is not the case for simpler machines, like 
the finite-state automaton (which head always moves toward the same direction) or 
the push-down automaton. 


Example of a Turing Machine Program 


Let M = (T, X, Q, ô) bea Turing Machine program: 
Tape alphabet: I” = (0, 1, b} 
Input alphabet: X = (0, 1} 
Set of states: Q = (qo. 91, 92,93, dv, 4N} 
Transition function 6: given in Table 1.1 
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Table 1.1 Specification of 
the transition function ó of a 
Turing machine 


Symbol ø € I’ on the tape 
Sae |o = |]. f 
qo [(4.0.D (1D  [|(n.5.—D 
qa [(2.5.—1 |(3.5.—D | (qu. b. -D 
qo — jr. b. —1 | @n.b,-1) |(qu. b, D 
d: |(qw.b.—D |(aqw. b. —D |(aw. b. —D 


1.2.3.2 Class P of Languages 


The class P (standing for polynomial) contains the problems considered easy: those 
for which an algorithm can solve the problem with a number of steps polynomially 
limited to the instance data size (the length of the string x initially written on the 
tape). More formally, this class is defined as follows: we say the machine M accepts 
x € X* if and only if M stops in the state gy. The language recognized by M is 
the set of strings x € X* such that M accepts x. We can verify that the language 
recognized by the machine given by the program in Table 1.1 is the strings encoding 
a binary number divisible by 4. 

An algorithm is a program that stops for any string x € X*. The computational 
time of an algorithm is the number of steps performed by the machine before it stops. 
The complexity of a program M is the largest computational time Ty (n) required 
by the machine to stop, whatever the string x of length n initially written on the 
tape is. A deterministic Turing machine program is in polynomial time if there is a 
polynomial p such that Ty (n) < p(n) 

The class P of languages includes all the languages L such that there is a 
program for deterministic Turing machine recognizing L in polynomial time. By 
abuse of language, we say the problem z belongs to the class P if the language 
associated with m and with an encoding scheme e (unspecified but supposed to 
be reasonable) belongs to P. When we use the expression "there is a program"; 
we know this program exists, but without necessarily knowing how to code it. 
Conversely, if we are aware of an algorithm—not necessarily the best one—running 
in polynomial time for this problem, then the problem belongs to the complexity 
class P. 


1.2.3.3 Class N P of Languages 


Informally, the class N P (standing for nondeterministic polynomial) of languages 
includes all the problems for which we can verify in polynomial time that a given 
solution produces the answer “yes.” For a problem to be part of this class, the 
requirements are looser than for the class P. Indeed, it is not required to be able 
to find a solution in polynomial time but only to be able to verify the correctness 
of a given solution in polynomial time. Practically, this class contains intractable 
problems, for which we are not aware of a polynomial time solving algorithm. 


22 1 Elements of Graphs and Complexity Theory 


Program 6 
Guessing Module 
|| 
Guessing head | Read/Write head 
Infinite tape 4 
P|»|1]o|1|1]o[o]o[o[1]o[1]1 []o]o[1 [o [ 1| 1[]o[» |» 

—2-10 12 3 4 

Cells 


Fig. 1.7 Schematic representation of a nondeterministic Turing machine. This machine allows 
formalizing the N P class, but does not exist in the real world 


To formalize this definition, theoretic computer scientists have imagined a new 
type of theoretical computer, the nondeterministic Turing machine, which has no 
material equivalent in our real world. Conceptually, this machine is composed of a 
module that guesses the solution of the problem and writes it into the negative index 
cells of the tape (see Fig. 1.7). This artifice allows us to overcome our ignorance 
of an efficient algorithm to solve the problem: the machine just does the job and 
guesses the solution. 

The specification of a program for a nondeterministic Turing machine is identical 
to that of a deterministic one. Initially, the machine is in state go, the tape contains 
the string x encoding the problem in cells | to |x|, and the program is idle. At that 
time, a guessing phase starts during which the module writes random symbols in the 
negative cells and stops arbitrarily. Next, the machine’s program is activated, and it 
works as a deterministic Turing machine. 

With such a machine, it is obvious that a given string x can generate various 
computations, because of the nondeterministic character of the guessing phase. 
The machine can end in qy state even if the problem includes a feasible solution. 
Different runs with various computational times can end in the qy state. But the 
machine cannot end in the state gy for a problem that has no solution. 

By definition, the language Lm recognized by the nondeterministic machine M 
is the set of strings x € X* such that there is at least one computation for which the 
string x is accepted. The computation time Ty (n) is the minimum number of steps 
taken by the machine to accept a string x of length n. The number of steps in the 
guessing phase is not counted. The complexity of a program is defined in a similar 
way to that of a deterministic machine. 

The class NP of languages is formally defined as the set of languages L for 
which there exists a program M for a nondeterministic Turing machine so that M 
recognizes L in polynomial time. We insist on the fact that the name of this class 
comes from “nondeterministic polynomial" and not from “non-polynomial.” 
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Fig. 1.8 Polynomial transformation of Problem, to Problem» in time T (n). The theory only 
requires to be able to carry out the operations represented with solid line arrows 


Polynomial Transformation 


The notion of polynomial transformation of an initial problem into a second one is 
fundamental in the theory of complexity, because it is of substantial help for problem 
classification. Indeed, if we are able to efficiently solve the second problem—or, for 
intractable problems, if we were able to efficiently solve the second problem—and 
we know an inexpensive way of transforming the initial problem into the second 
one, then we can also effectively solve the initial problem. 

Formally, a first language L1 C Xf can be polynomially transformed into a 
second language L2 C 275 if there is a function f : Xf — 25 that can be evaluated 
in polynomial time by a deterministic Turing machine, such that, for all problem 
instance x € £f with "yes" answer, f(x) is an instance of the second problem 
with “yes” answer. Such a polynomial transformation is written L1 ox L2. We write 
Li XTn) L2 if we want to specify the time T (n) required to evaluate f. 

Figure 1.8 illustrates the principle of a polynomial transformation. When trans- 
forming a problem into another one, it is solely concerned about the complexity of 
the evaluation of the f function and the answers “yes-no” of both instances should 
be the same. The complexity of solving instance 2 or that of the decoding of a 
solution of instance 1 from that of instance 2 is not required. 


Example of Polynomial Transformation 


Let us consider the problem of finding a Hamiltonian cycle in a graph (a cycle 
passing only once by all the vertices of the graph before returning to the starting 
vertex) and the traveling salesman problem. The last is to answer the question: 
is there a tour of total length no more than B? The f function to transform 
the Hamiltonian cycle into an instance of a traveling salesman builds a complete 
network on the same set of vertices as for the graph. In the network, it associates 
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a weight of zero with the existing edges of the graph and a weight of one with the 
edges that are missing in the graph. The bound B is zero. 

There is a solution of length O to the traveling salesman if and only if there is 
a Hamiltonian cycle in the initial graph. We deduce the Hamiltonian cycle can be 
transformed into a traveling salesman problem. It should be noted that the opposite 
is not necessarily true. 


1.2.3.4 Class NP-Complete 


A problem z belongs to the class NP-complete if x belongs to NP and every 
problem of N P can be polynomially transformed into 7r. 

Starting from the definition of a polynomial transformation and noting the 
composition of two polynomials is still a polynomial, we have the following 
properties: 


e Ifz is NP-complete and z can be solved in polynomial time, then P = NP. 

e Ifz is NP-complete and z does not belong to P, then P # NP. 

e [f xı polynomially transforms into 7r? and zr? polynomially transforms into 773, 
then zr; polynomially transforms into 73. 

e If xı is NP-complete, 22 belongs to NP and z polynomially transforms into 
T2, then m2 is NP-complete. 


No NP-complete problem that can be solved in polynomial time is known. It is 
conjectured that no such problem exists, hence it is assumed that P # NP. The 
latter property listed above is frequently exploited to show that a problem 72, of a 
priori unknown complexity, is NP-complete. For this, a problem zr, belonging to the 
NP-complete class is chosen, and a polynomial transformation of any instance of 77 
into an instance of 7r? is exhibited. 

The NP-complete class definition presented above is purely theoretical. Maybe, 
this class is just an empty one! Therefore, it should be asked whether there exists 
at least one problem belonging to this class or not? It is indeed far from obvious to 
find a "universal" problem of N P such that all the other problems of N P can be 
polynomially transformed into this problem. It is not possible to imagine what all 
the problems of N P are and even less to find a transformation for each of them into 
the universal problem. However, such a problem exists, and the first that was shown 
to be NP-complete was the satisfiability problem. 


Satisfiability 


Let u1, ...Um be a set of Boolean variables. A literal is a variable or its negation. A 
(disjunctive) clause is a finite collection of literals connected together with logical 
"or" (v). A clause is false if and only if all its literals are false. A satisfiability 
problem is a collection of clauses connected together with the logical *and" (^). An 
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instance of satisfiability is feasible if there are assignments of values to the Boolean 
variables such that all the clauses are simultaneously true. 

For instance, the satisfiability problem (u1 V u72) ^ (uj V u2) is feasible. However, 
(u1 V u3) ^ (u1 V u3) ^ (uj) ^ (u2) is not a feasible instance. The graph coloring 
problem modeled with a Boolean formula given at the very beginning of this chapter 
is a satisfiability problem. 

In the early 1970s, Cook shows that satisfiability is NP-complete. From this 
result, it was quite easy to show that many others also belong to the class NP- 
complete, using the principle stated in the remark above. In the late 1970s, several 
hundred problems were shown to be NP-complete. 

Below is the example of the polynomial transformation of satisfiability into the 
stable set problem. Since any problem of N P can be transformed into satisfiability 
and any satisfiability instance can be transformed into the stable set, the latter is 
NP-complete. 


Stable Set 


Data: a graph G = (V, E) and k an integer. Question: Is there a subset V’ C 
V,|V'| = k such that Vi, j € V', (i, j) € E (i.e, a subset of k nonadjacent 
vertices)? 

Satisfiability is transformed into stable set as follows: 


* A vertex is associated with all literals of each clauses. 

* Foreach clause, a complete subgraph is created. 

* [ncompatible literals-vertices are connected together (a variable and its negation). 

* A stable set of k vertices is searched in this graph, where k is the number of 
clauses. 


Such a transformation is illustrated in Fig. 1.9 for a little instance with three literals 
and three clauses. 


Fig. 1.9 Polynomial 
transformation of 
satisfiability instance: 

GV yVZ)^QV3)^QVaz) 
to a stable set 
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Example of Unknown Complexity Problems 


At this time, thousands of problems have been identified to be either in P or in NP- 
complete class. A number of them are not yet classified more precisely than in N P. 
Here are two examples of such problems: 


* Ina soccer league, each team play each other once. The winning team receives 
three points. The losing team receives zero points. In case of a tie, each team 
receives one point. Given a series of scores for each team, can this series be the 
result obtained at the end of a championship? Note: if the winner receives only 
two points, then there is a polynomial algorithm to answer this question. 

e Is it possible to orient the edges of a graph so that it is strongly connected and 
that each vertex has an odd indegree? 


1.2.3.5 Strongly NP-Complete Class 


In some cases, NP-complete problem instances are well solved by means of ad hoc 
algorithms. For instance, dynamic programming can manage knapsack problems 
(see Sect. 2.5.3) with numerous items. A condition for these instances to be easily 
solved is that the largest number appearing in the data is limited. For the knapsack 
problem, this number is its volume. On the contrary, other problems cannot be 
solved effectively, even if the value of the largest number appearing in the problem 
is limited. 

We are addressing a number problem if there is no polynomial p(n) such that the 
largest number M appearing in the data of an instance of size n is bounded by p(n). 
The partition of a set into two subsets of equal weight or the traveling salesman 
are, therefore, problems on numbers because, if we add one bit to the size of the 
problem, M can be multiplied by two. Therefore, for these problems, M can be in 
O (2^), which is not polynomial. 

We say an algorithm is pseudo-polynomial if it runs in a time bounded by a 
polynomial depending on the size n of the data and the largest number M appearing 
in the problem. The partition of a set into two subsets of equal weight is an NP- 
complete problem for which there is a simple pseudo-polynomial algorithm. 


Instance of a Partition Problem 


Is it possible to divide the set (5, 2, 1, 6, 4} into two subsets of equal weights? The 
sum of the weights for this partition problem instance is 18. Therefore, we look for 
two subsets of weight 9. 

To solve this problem, we create an array of n rows, where n is the number of 
elements in the set, and M — 9 columns, where M is half of the sum of the element 
weights. We eventually fill the cells of this table with x by proceeding line by line. 
Using only the first element, of weight 5, we manage to create a subset of weight 0 
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(if we do not take this element) or a subset of weight 5 (taking it). Hence, we place 
x in the columns 0 and 5 of the first line. 

Using only the first two elements, it is possible to create subsets whose weight is 
the same as with a single element (by not taking the second element). In the second 
line of the table, we can copy the x of the previous line. By taking the second 
element, we can create subsets of weights 2 and 7. Hence, we put x where we put 
them for the previous line but shifted by the weight of the second element (here: 2). 

The process is then repeated until all the elements have been considered. As soon 
as there is a x in the last column, it means it is possible to create a subset of weight 
M. This is the case for this instance. One solution is (2, 1, 6}{5, 4}. The complexity 
of the algorithm is O(M - n), which is indeed polynomial in n and M. 


Sum of the weights 
Element | 0 1 2 3 4 5 6 7 8 9 
— L " — " = 
2 x x x x 
1 x x x x 
6 x x x x 
4 x x x x x 


Let z be a number problem and zp(n) C 7, the subset restricted to instances for 
which M < p(n). The set 7 p(n) contains only instances of x with “small” numbers. 
It is said that is strongly NP-complete if and only if there is a polynomial p(n) 
such that 77 p(n) is NP-complete. 

With this definition, a strongly NP-complete problem cannot be solved in 
pseudo-polynomial time if the class P is different from the class N P. Thus, the 
traveling salesman problem is strongly NP-complete because the Hamiltonian cycle 
can polynomially transform into the traveling salesman with a distance matrix 
containing only Os or 1s. Since the Hamiltonian cycle is NP-complete, traveling 
salesman instances involving only small numbers are also NP-complete. 

Conversely, the problems that can be solved with dynamic programming, like 
the knapsack or the partition problem, are not strongly NP-complete. Indeed, if 
the sum of the weights of the n elements of a partition problem is bounded by 
a polynomial p(n), the algorithm presented above has complexity in O(n - p(n)) 
which is polynomial. 


1.2.4 Other Complexity Classes 


Countless other complexity classes have been proposed. Among those which are 
most frequently encountered in the literature and which can be described intuitively, 
we can cite: 


28 1 Elements of Graphs and Complexity Theory 


NP-Hard The problems considered above are decision problems, not optimization 
ones. With a dichotomy algorithm, we can easily solve the optimization problem 
associated with a decision problem. A problem is NP-hard if any problem of 
N P can transform into this problem in polynomial time. Unlike the NP-complete 
class, we do not force the latter to be part of N P. Thus, an optimization problem 
whose decisional version is NP-complete falls into the category of NP-hard 
problems. 

P-SPACE The problems that can be solved with a machine whose memory is 
limited by a polynomial in the data size belong to the class P-SPACE. No limit is 
imposed here on the computational time, which can be exponential. Thus, all the 
problems of N P are in P-SPACE because we can design exhaustive enumeration 
algorithms that do not require too much memory. An example of a problem in 
this class is to determine whether a two-player deterministic game is unfair, i.e., 
if player B is sure to lose if player A does not make mistakes. This problem is 
unlikely to be part of the class N P, because it is hard to imagine that a concise 
certificate can be given for solutions to problems of this class. 

Class L The problems which can be solved with a machine whose working 
memory is bounded by a polynomial in the size of the data—by disregarding 
the space necessary for the storage of the problem data—are part of the class L. 
This class includes problems of finding elements in databases whose size does 
not fit in the computer RAM. 

Class NC The class NC contains the problems that can be solved in poly- 
logarithmic time on a machine including a polynomial number of processors. The 
problems of this class can therefore be solved in parallel in a shorter time than 
that which is needed to sequentially read the data. The sorting of the elements of 
an array falls under the NC class. 


Few results have been established regarding the relationships between these 
various complexity classes. With the exception of the obvious inclusions in the 
broad sense L C P € NP C NP-complete C P-SPACE and NC C P, the only 
strict inclusion established is L zz P-SPACE. It is conjectured that P Z NP. This 
is a millennium problem. A deeper presentation of this topic can be found in [2]. 


Problems 


1.1 Draw Five Segments 
Try to draw five segments of lines on the plane so that each segment cuts exactly 
three others. Formalize this problem in terms of graphs. 


1.2 O Simplification 

Simplify the following expressions: 
* O(m +2") 

* OS" +2”) 
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© Q(n?- n! 4- (n 4-2)! 
* Q(nlog(log(n)) + 23n) 
é O (n8™ 4. ,5-Fcos(m) 


* O(nlog(n)+ n3—2sin(n)y 


1.3 Turing Machine Program 

Write a deterministic Turing machine program that recognizes if the substring ane is 
written on the tape. The input alphabet is 2 = (a, c, e, n}. Specify the tape alphabet 
I’, the state set Q and the transition function ô. 


1.4 Clique is NP-Complete 
Show that finding a clique of a given size in a graph is NP-complete. 


1.5 Asymmetric TSP to Symmetric TSP 

Show that the asymmetric traveling salesman problem—the distance from city i 
to j can be different from the distance from city j to i—can be polynomially 
transformed into the symmetric TSP by doubling the number of cities. 
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Chapter 2 A 
A Short List of Combinatorial gai 
Optimization Problems 


After reviewing the main definitions of graph theory and complexity theory, this 
chapter reviews several combinatorial optimization problems. Some of these are 
easy, but adding a seemingly trivial constraint can make them difficult. We also 
briefly review the operating principle of simple algorithms for solving some of these 
problems. Indeed, some of these algorithms, producing a globally optimal solution 
for easy problems, have strongly inspired heuristic methods for intractable ones; in 
this case, they obviously do not guarantee that an optimal solution is obtained. 


2.1 Optimal Trees 


Finding a connected sub-graph of optimal weight is a fundamental problem in graph 
theory. Many applications require discovering such a structure as a preliminary 
step. A typical example is the search for a minimum cost connected network (water 
pipes, electrical cables). Algorithmic solutions to this type of problem were already 
proposed in the 1930s [1, 2]. 


2.1.1 Minimum Spanning Tree 


The minimum spanning tree problem can be formulated as follows: given an 
undirected network R = (V, E, w) on a set V of vertices, a set E of edges with 
a weight function w —> R, we are looking for a connected, cycle-free subset whose 
total edge weight is as small as possible. Mathematically, the minimum spanning 
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tree problem is not so simple to formulate. An integer linear program containing an 
exponential number of constraints is: 


Minimize z= > wle)xe (2.1) 
ecE 
xe € {0,1} Week (2.2) 
X x =V- 1 (2.3) 
ecE 
Yo x <|S|-1 VWSCV,S#®@ (2.4) 
ec E(S) 


where E(S) is the subset of edges with both ends in the vertex subset S. 

The variables xe are constrained to be binary by (2.2). They indicate if edge e 
is part of the tree (xe = 1) or not (xe = 0). Constraint (2.3) ensures that enough 
edges are selected for ensuring connectivity. Constraints (2.4) eliminate the cycles 
in the solution. Such a mathematical model cannot be used as is, since the number 
of constraints is far too large. It can be used interactively. The problem is solved 
without cycle elimination constraints. If the solution contains a cycle on the vertices 
of a subset S, the constraint that eliminates it is specifically added before restarting. 

Such an approach is fastidious. Fortunately, there are very simple methods for 
finding a minimum spanning tree. The most famous algorithms to solve this problem 
are those of Kruskal and Prim. They are both based on a greedy method. Greedy 
algorithms are discussed in Sect. 4.3. They build a solution incrementally from 
scratch. At each step, an element is included in the structure in construction, never 
changing the choice of this element later. 

The Kruskal Algorithm 2.2 starts with a graph T = (V, Er = ©). It successively 
adds an edge of weight as low as possible to Er while ensuring no cycle is created. 


Algorithm 2.1: (Kruskal) Building a minimum spanning tree. Efficient 
implementations use a special data structure for managing disjoint datasets. 
This is required to test if the tentative edge to add is part of the same 
connected component or not. In this case, the complexity of the algorithm is 
O(|E|log |E) 
Data: Undirected connected network R = (V, E,w) 
Result: Minimum spanning tree T — (V, Er) 
Sort and renumber the edges by nondecreasing weight w(e1) < w(e2) < +- < w(eig) 
Er = Ø 
fork=1...|E| do 

| if Er U {ex} has no cycle then 


n RO 


Ji Er — Er U {eg} 
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Algorithm 2.2: (Jarník) Building a minimum spanning tree. The algorithm 
was later rediscovered by Prim and by Dijkstra. It is commonly referred 
to as Prim or Prim-Dijkstra algorithm. For an efficient implementation, an 
adequate data structure must be used to extract the vertex of L with the 
smallest weight (Line 8) and to change the weights (Line 14). A Fibonacci 
heap or a Brodal queue allow an implementation of the algorithm in O (| E |4- 
|V Hog |V|) 

Data: Undirected connected network R = (V,E,w), a given vertex s € V 

Result: Minimum spanning tree T = (V, Er) 
1 forall Vertex i € V do 


2 Àj — eo // Cost for introducing i into T 
pred; — Ø // Predecessor of i 


w 


4 

5 LeV // List of vertices to introduce in T 
€ while L 4 2 do 
8 
9 


Remove the vertex i with the smallest A; from L 
if i Z s then 

10 Er — Er U (predi, i} 

11 forall Vertex j adjacent to i do 

12 if j € Land Àj > w((i, j}) then 

14 | Àj — w({i, j}) 

15 pred; — i 


The Prim Algorithm 2.2 starts with a graph T = (V’ = {s}, Er = Ø) and 
successively adds a vertex v to V' and an edge e to Er, such that the weight of e is 
as low as possible and one of its ends is part of V' and the other not. Put differently, 
Kruskal starts with a forest with as many trees as there are vertices and seeks to 
merge all these trees into a single one while Prim starts with a tree consisting of a 
single vertex and seeks to make it growing until comprising all vertices. 


2.1.2 Steiner Tree 


The Steiner tree problem is very close to that of the minimum spanning tree. The 
sole difference is that the vertices of a subset S C V must not necessarily appear in 
the tree. S is the set of Steiner vertices. The other ones that must belong to the tree 
are designated as terminal vertices. The Euclidean version of the Steiner tree is to 
connect a given set of terminal points on the plane by lines whose length is as short 
as possible. Figure 2.1 shows the minimum spanning tree, using solely the edges 
directly connecting the terminals and a Steiner tree. The weight of the minimum 
spanning tree may be larger than that of a Steiner tree where appropriately Steiner 
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Q ~x Steiner nodes @ 
e 
O O 
Minimum spanning tree Steiner tree 


Fig. 2.1 Minimum spanning tree using only terminal nodes, which are, therefore, directly 
connected to each other and minimum weight Steiner tree, where additional nodes can be used 


nodes are added. The combinatorial choice of vertices to add makes the problem 
NP-hard. 


2.2 Optimal Paths 


Searching for optimal paths is as old as the world. Everyone is aware of this 
problem, especially since cars are built with a navigation system. Knowing the 
current position on a transport network, the aim is to identify the best route to a 
given destination. The usual criterion for the optimality of the path is time, but it 
can also be distance, especially if it is a walking route. 


2.2.1 Shortest Path 


Formally, let R — (V, E, w) be a directed network. We want to find a shortest walk 
starting at node s and ending at node f. Naturally, "shortest" is an abuse of language 
and designates the sum of the edge weight. The lasts can represent something 
other than a distance, such as a time, energy consumption, etc. Considering the 
algorithmic complexity, it is not more expensive to find the optimum walks from a 
particular node s to, or from all the vertices of V. 

This formulation can be problematic in the case of a general weighting function. 
Indeed, if there are negative weights, the shortest walk may not exist if one has a 
negative length circuit. Dijkstra's algorithm is the most effective one to discover 
the shortest path in a network where the weighting function is not negative: 
w(e) > OVe € E. Itis formalized by Algorithm 2.3. 
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Algorithm 2.3: (Dijkstra) Searching for a shortest path from s to all other 
nodes in a non-negative weighting network. The red color highlights the two 
differences between this algorithm and Prim's one (Algorithm 2.2) 
Data: Directed Network R = (V, E,w) with w(e) > 0 Ve € E, given by successor lists 
succ(i) for each vertex i € V, a given vertex s 
Result: Immediate predecessor pred; of j on a shortest path from s to j, Vj € V and 


length A; of the shortest path from s to j 
1 forall Vertex i € V do 


2 Ài Eo 
3 pred; — Ø 


4 As =0 

5 L~—V // Vertices for which the shortest path is not definitive 
6 repeat 

8 Remove vertex i with smallest A; value from L 

9 forall Vertices j € succ(i) do 

10 if j € Land Àj >A; + w(i, j)) then 

12 Àj — ài + w(i, j)) 

13 | pred; — i 


14 until L # Ø 


The idea behind this algorithm is to store, in a set L, the vertices for which 
the shortest path from the starting vertex s has not yet been definitively identified. 
A value A; is associated with each vertex i. This value represents the length of 
an already discovered path from s. Since we suppose non-negative weights, the 
node i € L with the smallest value is a new vertex for which the shortest path is 
definitively known. The node i can, therefore, be removed from L while checking 
whether its adjacent vertices could be reached with a shorter path passing through i. 

For an efficient implementation, an adequate data structure must be used to 
extract the vertex of L with the smallest value (Line 8) and to change the values 
(Line 12) of its adjacent nodes. Similarly to Prim's Algorithm 2.3, a Fibonacci 
heap or a Brodal queue allows an implementation of the algorithm in O(|E| + 
|V|log |V)). 

It is interesting to highlight the significant similarity between this algorithm and 
that of Prim 2.2 for finding a minimum spanning tree. The recipe that worked for 
this problem still works, with some restrictions, for discovering a shortest path. The 
general framework of the greedy methods, on which this recipe is based, is presented 
in Sect. 4.3 of the chapter devoted to constructive methods. Code 2.1 provides an 
implementation of Dijkstra’s algorithm, in case the network is dense enough for 
reasonably specifying it with a square matrix. 
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Code 2.1 dijkstra.py Implementation of Dijkstra’s algorithm for a complete network specified 
by a matrix (d;j) providing the weight of each arc (i, j). In this case, managing L with a simple 
array is optimal 


iHHHHHHHHE Dijkstra algorithm for finding all shortest paths from start 

def dijkstra (n, # number of cities 
d; # distance matrix (with no negative values) 
start): # starting city 


order = [i for i in range(n)] # Cities ordered by increasing shortest path 
pred = [start] « n # Immediate predecessor on a shortest path from start 
length = [float('inf')] « n 4 Shortest path lengths 
length[start] = 0 # Only shortest path to order[0]=start already known 
order[0], order[start] = order[start], order[0] 


for i in range(0, n - 1): # Update shortest path for neighbors of order[i] 
for j in range(i+1, n): # For all neighbors to update 
if length[order [i] + d[order[ill [order[j]] < length[order[j]]: 
length[order[j]] = length[order[i]] + d[order[il]l [order [j]] 
pred[order[jl] = order[i] 
# Update order if a better i«1th shortest path is identified 
if length[order[i+1]] > length[order[jl]: 
order[i«1], order[j] = order[j], order[i+1] 


return length, pred 


Also note that Code 4.3 implements one of the most popular greedy heuristics for 
the traveling salesman problem. It displays exactly the same structure as Code 2.1. 

When the weights can be negative, the shortest walk exists only if there is no 
negative length circuit in the network. Written differently, this walk must be a simple 
path. A more general algorithm to find shortest paths was proposed by Bellman and 
Ford (see Algorithm 2.4). It is based on verifying, for each arc, that the Bellman 
conditions are satisfied: A; < A; + w(i, j). In other words, the length of the path 
from s to j should not exceed that of s to i plus the length of the arc (i, j). If it were 
the case, there would be an even shorter path up to j, passing through i. 

The working principle of this algorithm is completely different from the greedy 
algorithms we have seen so far. Rather than definitively including an element to 
a partial solution at each step, the idea is to try to improve a complete starting 
solution. The last can be a very bad one, easy to build. The general framework of this 
algorithm is that of a local improvement method. At each step of the algorithm, the 
Bellman conditions are checked for all the arcs. If they are satisfied, all the shortest 
paths have been found. If one finds a vertex j for which A; > A; + w(i, j), the 
best path known to the node j is updated by storing the node i as its predecessor. 
Making such a modification can invalidate the Bellman conditions for other arcs. It 
is, therefore, necessary to check again, for all arcs, if a modification has no domino 
effect. 

A question arises: without further precaution, does an algorithm based on this 
labeling update stop for any entry? The answer is no: if the network has a negative 
length circuit, there are endless modifications. In case the network does not have 
a negative length circuit, the algorithm stops after a maximum of |V| scans of the 
Bellman conditions for all the arcs of E. Indeed, if a shortest path exists, its number 
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Algorithm 2.4: (Bellman-Ford) Finding shortest paths from s to all other 
nodes in any network. The algorithm indicates if the network has a negative 
length circuit accessible from s, which means that the (negative) length 
of the shortest walk is unbounded. This algorithm is excessively simple to 
implement (the code is hardly longer than the pseudo-code provided here). 
Its complexity is in O(|E||V]) 

Data: Directed network R = (V, E,w) given with an arc list, a starting node s 

Result: Immediate predecessor pred; of j on a shortest path from s to j with its length Àj, 

Vj € V, or: warning message of the existence of a negative length circuit 


1 forall i € V do 

2 L Àj — © ; pred; — Ø 

3 À.—0 

4 k—0 // Step counter 
5 Continue — true // At least one A modified at last step 
6 while k < |V| and Continue do 

7 Continue — false 

8 

9 


k—k-1 
forall arc (i, j) € E do 
10 if Àj > A; - w(i, j) then 
n Àj — Ait+wi(i, j) 
12 pred; — i 
13 Continue — true 


14 if k = |V| then 
15 Warning: there is a negative length circuit that can be reached from s 


of arcs is at most |V| — 1. Each scan of the arcs of E definitively fixes a value 
satisfying the Bellman condition for at least one vertex. 

The Bellman-Ford algorithm is based on an improvement method with a well- 
defined stopping criterion: if there are still values updated after |V | steps, then the 
network has a negative length circuit and the algorithm stops. If a scan finds out that 
the Bellman conditions are satisfied for all the arcs, then all the shortest paths are 
identified and the algorithm stops. 

Seeking optimal paths appears in many applications, especially in project 
planning and scheduling. The problems that can be solved by dynamic programming 
can be formulated as finding an optimal path in a layered network. This technique 
uses the special network topology to find the solution without having to explicitly 
construct the network. 


2.2.1.1 Linear Programming Formulation of the Shortest Path 


It is relatively easy to formulate the problem of finding the shortest path from a 
vertex s to a vertex ¢ in a network under the form of a linear program. For this 
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purpose, a variable x;; is introduced for each arc (i, j) to indicate whether the last 
is part of the shortest path. The formulation below may seem incomplete: indeed, 
the variables x;; should either take the value 0 (indicating the arc (i, j) is not part 
of the shortest path) or the value 1 (the arc is part of it). Constraints (2.8) are 
sufficient: if a variable receives a fractional value in the optimal solution, it means 
there are several shortest paths from s to t. Constraint (2.7) imposes that there is a 
unit “quantity” arriving in f. This amount can be split inside the network, but each 
fraction must use a shortest path. Constraints (2.6) impose that the quantity arriving 
at any intermediate node j must depart from it. It is not required to explicitly impose 
that a unit quantity leaves s. Such a constraint would be redundant with (2.7). The 
objective (2.5) is to minimize the cost of the arcs retained. 


Minimize z= b» w(i, j)xij (2.5) 
ij 


n n 
xi — Dox = 0 Vi AS, j st (2.6) 
i=l k=l 


n n 
9 xi — xd (2.7) 
i=1 k=1 
xij 20 Vi, j (2.8) 
Another formulation of this problem is to directly look for the lengths A; of the 


shortest paths by imposing the Bellman conditions. This leads to the following linear 
program, which is the dual of the previous one. 


Maximize Àr (2.9) 
Subject Aj — Ài & w(i, j) Vi, j (2.10) 
to As —0 (2.11) 


Duality carries out a significant role in linear programming. Indeed, it is shown 
that any feasible solution to the primal problem has a value that cannot be lower 
than a feasible solution value to the dual. If a feasible solution value to the primal 
problem exactly reaches a feasible solution value to the dual, then both solutions are 
optimal. For the shortest path problem, the optimal A; value corresponds to the sum 
of the lengths of the arcs that must be used in an optimum path from s to t. 


2.2.2 Elementary Shortest Path: Traveling Salesman 


The shortest walk problem is poorly defined, because of the negative length circuits. 
However, one could add a very natural constraint, which makes it perfectly defined: 


2.2 Optimal Paths 39 


look for the shortest elementary path from a particular node s to all the others. It is 
recalled that an elementary path visits each vertex at most once. In this case, even 
if there are negative length circuits, the problem has a finite solution. Unfortunately, 
adding this little constraint makes the problem difficult. Indeed, it can be shown that 
the traveling salesman problem, notoriously NP-hard, can transform polynomially 
into the elementary shortest path problem. 

The traveling salesman problem (TSP) is the archetype of hard combinatorial 
optimization, on the one hand, because of the simplicity of its formulation and, 
on the other hand, because it appears in many applications, particularly in vehicle 
routing. 

The first practical application of the traveling salesman problem is clearly finding 
a shortest tour for a trading clerk. In the nineteenth century, Voigt edited a book 
exhibiting how to make a round trip in Germany and Switzerland [5]. 

There are many practical applications to this problem. For instance, Sect. 2.2.3, 
shows that vehicle routing implies solving many traveling salesman instances. As 
presented further (see Sect. 3.3.1), it can also appear in problems that have nothing 
to do with routing. 

In combinatorial optimization, the TSP is most likely the one that has received 
the most attention. Large Euclidean instances—more than 10,000 nodes—have been 
optimally solved. There are solutions that do not deviate from more than a fraction 
of a percent from the optimum for instances with several million cities. Since this 
problem is NP-hard, there are much smaller examples that cannot be solved by exact 
solution methods. The TSP polynomially transforms into the shortest elementary 
path as follows. 

A vertex is duplicated in two vertices s and t and the weight w(i, j) of all the 
arcs is replaced by w(i, j) — M, where M is a positive constant larger than the 
largest weight of an arc. If there is no arc between s and f, the shortest elementary 
path from s and ¢ corresponds to a minimum tour length for the traveling salesman. 
Figure 2.2 illustrates the principle of this transformation. Knowing that the TSP is 
NP-hard, it proves that the shortest elementary path is NP-hard too. 


2.2.2.4 Integer Linear Programs for the TSP 
There are numerous integer linear programs modeling the TSP. Two of the best 
known are presented here. 


Dantzig-Fulkerson-Johnson 


The Dantzig-Fulkerson-Johnson formulation introduces an exponential number of 
sub-tour elimination constraints. The binary variables x;; take the value 1 if the arc 
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Fig. 2.2 Polynomial transformation from a traveling salesman into an elementary shortest path. 
Vertex 1 is duplicated and the weight of each edge is set to the original weight minus 50. Finding 
the shortest elementary path from s to t is equivalent to finding the optimal TSP tour in the original 
network 


(i, j) 1s used in the tour and 0 otherwise. 


Minimize z= wii, ji (2.12) 
Gj) 

xj €0,1 Vi, j (2.13) 

n 
yas =1 Vj (2.14) 

i=l 

n 
$ 4g el Vi (2.15) 

j=l 
XO xy <|S|-1 YSGV,S#2 (2.16) 

(i, j)eE(S) 


Constraints (2.14) impose to enter exactly once in each city. Constraints (2.15) 
impose to come out exactly once from each city. Constraints (2.16) ensures that 
no proper subset S contains a sub-tour. 

Compared to the linear program for finding a minimum weight tree, it differs 
only in Constraints (2.14) and (2.15) which replace Constraint (2.3). 


Miller—Tucker—Zemlin 


The Miller-Tucker-Zemlin formulation replaces the exponential number of con- 
straints (2.16) by a polynomial number of constraints and introducing |V| — 1 
continuous variables u;, (i = 2... |V|). The new variables provide tour ordering. If 
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ui < uj, then city i is visited before city j. In this formulation, constraints (2.13)- 
(2.15) are retained and constraints (2.16) are replaced by: 


u-ujc|Vi &«|V|-1 2<i#j<|V] (2.17) 
Ixu«|VI-1 2<i<IVI (2.18) 


This integer linear program is probably not the most efficient one, but it has 
relatively few variables and constraints. 


2.2.3 Vehicle Routing 


Problems using the traveling salesman as a sub-problem naturally appear in the 
vehicle routing problem (VRP). In its simplest form, the last can be formulated 
as follows: let V be a set of customers requesting quantities q; of goods (i = 
1,..., |V). They are delivered by a vehicle with capacity Q, starting from and 
returning to a warehouse d. The customers must be split into m subsets Vj, ... Vin 
such that D iev, qi € Q. For each subset V; U {d}, (j = 1,..., m), a traveling 
salesman tour as short as possible must be determined. Figure 2.3 illustrates a 
solution of a small VRP instance. 

This problem naturally occurs for delivering or collecting goods and in home 
service planning. In concrete applications, many complications exist: 


* The number m of tours can be fixed or minimized; 

* The maximum length of the tours can be limited; 

* The clients specify one or more time windows during which they should be 
serviced; 

* The goods can be split implying multiple passages at the same client; 

* A tour can both collect and deliver goods; 

* There is more than one warehouse; 

e Warehouses are hosting heterogeneous fleets of vehicles; 

* The warehouses locations can be chosen; 

* etc. 


Since the problem is to find the service order of customers, the problem is also 
referred to as “Vehicle Scheduling.” 


2.3 Scheduling 


Scheduling is to determine the order to process a number of operations. Their 
processing consumes resources, for instance, time on a machine. Operations that 
need to be processed in a specific order are grouped into jobs. The purpose of 
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Fig. 2.3 Vehicle routing problem instance. Trips from and to the warehouse are not drawn for not 
overloading the illustration. This solution was discovered by means of a taboo search, but it took 
decades before its optimality was proven. This gives an idea of the difficulty of the problem 


scheduling is to optimize resource consumption. Various optimization criteria are 
commonly used: minimizing the makespan; minimizing the total time; minimizing 
the average delay; etc. A frequent constraint in scheduling is that a resource cannot 
perform several operations simultaneously and that two operations of a job cannot 
be performed simultaneously. Operations may include various features according to 
applications: 


Resource An operation must take place on a given resource or subset of resources 
or must require several resources simultaneously. 

Duration Processing an operation takes time, which may depend on the operating 
resource. 

Set-up time Before performing an operation, the resource requires a set-up time 
depending on the previously completed operation. 

Interrupt — After an operation has started, it can be suspended before ending. 

Pre-emption A resource can interrupt an operation to process another one. 

Waiting time There can be either a waiting time between two successive opera- 
tions of the same task or a waiting time is prohibited. 

Release date An operation cannot take place before being available. 

Deadline An operation cannot be processed after a given date. 
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Fig. 2.4 Permutation flowshop scheduling. Gantt chart for a small instance with 4 resources and 5 
jobs. Top: non-optimal schedule with the earliest starting time for each operation. Bottom: optimal 
scheduling with the latest starting time 


In addition, resources may have a variety of features. They can be mobile in the 
case of carriers, resulting in colliding problems. There may be several machines of 
the same type, machines that can perform different operations, etc. 


2.3.1 Permutation Flowshop Scheduling 


A fundamental scheduling problem is the permutation flowshop. This problem 
occurs, for example, in an assembly line in which the n jobs must be successively 
processed on the machines 1,2,...,m, in that order. A job j must, therefore, 
undergo m operations which take a time f;j, (i = 1,...,m,j = 1,...,n). The 
goal is to find the order to process the job in the assembly line. Written differently, 
to find a permutation of the job such that the last job on the last machine finishes 
as early as possible. There is a buffer that may store jobs between each machine. 
Hence, the jobs can possibly wait for the next machine to finish processing a job 
that has arrived earlier. A convenient way to represent a scheduling solution is the 
Gannt chart. The x-axis represents time and the y-axis represents resources. 

Figure 2.4 provides both Gannt charts of a non-optimal solution, where each 
operation is planned as early as possible as well as an optimal scheduling where 
each operation starts as late as possible. 

For problem instances with only 2 machines, there is a greedy algorithm finding 
an optimal solution to this problem. The operations are ordered by increasing 
durations and put in a list. The operation with the shortest duration is first selected. 
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If this operation takes place on the first machine, the corresponding job is placed 
at the very beginning of the sequence. Else, if the operation takes place on the 
second machine, the job is placed at the very end of the sequence. The operation 
is removed from the list before examining the subsequent operation. The sequence 
is thus completed by dispatching the jobs either after the block processed at the 
beginning of the sequence or before the block at the end. As soon as the instance 
has more than 2 machines, the problem is NP-hard. A mixed integer linear program 
for the permutation flowshop is as follows: 


Minimize d, (2.19) 
dmj +tmj Sdo (j =1...n) (2.20) 
dij + tij &dijij (=1,...m—1,j =1...n) (2.21) 
dij + tij &dig- M.(1— yj) (G-—L...m,jol..nj«kz-2...n) 
(2.22) 

dik + tik S dij +M- yjk ( —l,...m,j—1l...n, j «k —2...n) 
(2.23) 
di 20 G=1,...m,j —1...n) (2.24) 
yj € {0,1} (jal...nj«kz2...n) (2.25) 


Objective (2.19) is to minimize the makespan do. The variable dij corresponds 
to the starting time of job j on machine i. Constraints (2.20) require that the end of 
the process of each object j on the last machine occurs not later than the makespan. 
A job j must have finished its processing on a machine i before being processed by 
the machine i + 1 (2.21). The yjx variables indicate whether the job j should be 
processed before the job k. Only n - (n — 1)/2 of these y., variables are introduced, 
since yy; should take the complementary value 1 — yjx. Both Constraints (2.22) 
and (2.23) involve a large constant M for expressing disjunctive constraints: either 
the job j is processed before the job k or k before j. If y;;, = 1, j is processed before 
k and Constraints (2.23) are trivially satisfied for any machine i, provided M is 
large enough. Conversely, if yj; = 0, Constraints (2.22) are trivially satisfied while 
Constraints (2.23) require finishing the processing of k on the machine i before the 
latter can start the processing of j. 


2.3.2 Jobshop Scheduling 


The jobshop scheduling problem is somewhat more general. Each job undergoes a 
certain number of operations, each of them being processed by a given machine. 
The operation sequence for a job is fixed, but different jobs do not necessarily have 
the same sequence and the jobs are not required to be processed by all machines. 
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Machine 3 


Machine 2 


Machine 1 


Fig. 2.5 Graph corresponding to the solution of a jobshop instance with three machines. One job 
undergoes 3 operations while two others have only 2. The weighting of the arcs corresponds to 
the duration of the corresponding operation. The arcs representing the precedence relations of the 
operations belonging to the same job are in dotted lines. The longest path from o to w is shown in 
bold. It is referred to as the critical path 


Figure 2.5 illustrates how to express this problem in terms of a graph: Each 
operation is associated with a vertex. Two fictitious vertices-operations are added: 
start (a) and end (w). If operation k immediately follows operation i on the same 
job, an arc (i, j) is introduced. The length of the arc is fj, corresponding to the 
duration of operation i. Arcs of length 0 are added from start to the first operations 
of each job. Arcs with a length corresponding to the duration of the last operation of 
each job are connecting the end vertex. 

All operations taking place on the same machine are forming a clique. The goal 
of the problem is to direct the edges of these cliques to minimize the length of the 
longest path from start to end. 

An integer linear program for the jobshop is as follows: 


Minimize d, (2.26) 
ditt <d; VG, j) (2.27) 
di+ti < dy Vi (2.28) 
di 4- tj S dk + M - (1 — yi) Vi, k on the same machine (2.29) 
dy + tk Sdi +M - yir Vi, k on the same machine (2.30) 

di 20 Vi (2.31) 
yik € {0,1} Vi, k on the same machine (2.32) 


The variable d; is the starting time of operation i. The goal is to minimize the 
makespan d, (the starting time of the dummy operation w). Constraints (2.27) 


46 2 A Short List of Combinatorial Optimization Problems 


require that operation i must be completed before starting operation j if i precedes 
j for a given job. Constraints (2.28) require that the end of processing times for 
all operations precede the end of the project. The variables yj; associated with the 
disjunctive constraints (2.29) and (2.30) determine whether operation i precedes 
operation k, which takes place on the same machine. 


2.4 Flows in Networks 


The concept of flow arises naturally when considering material, people, or electricity 
that must be transported over a network. In each node one must have the equivalent 
to Kirchhoff's current law: the amount of flow coming to a node must be equal to 
the amount going out of that node. 

The most elementary form of flow problems is as follows. Let R = (V, E, w) be 
a network. Flows values x;; passing through the arcs (i, j) € E are sought such that 
the sum of the flows issuing from a particular source-node s to reach a sink-node 
t is maximized. The conservation of flows must be respected: the sum of the flows 
entering a vertex must equal that of exiting the vertex, except for s and t. Then, the 
flows xj; cannot be negative and cannot exceed the positive value w(i, j) associated 
with the arcs. To solve this problem, Ford and Fulkerson proposed the relatively 
simple Algorithm 2.5. 


Algorithm 2.5: (Ford and Fulkerson) Maximum flow from s to t 


Input: Oriented network R = (V, E,w), a source-node s and a sink-node t 
Result: Maximum flow from s to t 
Starts with a null flow in all arcs 
repeat 
Build the residual network R* corresponding to the current flow 
if There is a path from s to t in R* then 
| Find the maximal possible flow from s to t in R* along this path 


Au ROGO on 


Superimpose this flow on the current flow (diminish the flow in the arcs (i, j) of R 
appearing as (j,i) arcs on the path in R*) 


7 until there is no path from s to t in R* 


It is based on an improvement method: its start from a null flow (which is always 
feasible) increasing it at each step along a path from s to ¢ until reaching the 
optimum flow. The first step of this algorithm is illustrated in Fig. 2.6. However, 
we can be blocked in a situation where there is no augmenting path from s to t 
while not having the maximal flow. 

To overcome this difficulty, it should be noted that we can virtually increase the 
flow from a vertex j to a vertex i by decreasing it from i to j. Therefore, at each 
stage of the algorithm, a residual network is considered. 
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SDN 


Capacity = 2 


Capacity = 1 


Fig. 2.6 Representation of a flow problem in terms of graphs. An empty triangle indicates an 
unused flow capacity. A unit flow passing through an arc is indicated by a filled triangle. The Ford 
and Fulkerson algorithm starts from a null flow (top) and finds the largest increase along a path 
from s to t. For this example, the first path discovered is s — 1 — 2 — t. After augmenting the flow 
along this path, there is no direct augmenting path (bottom) 


Fig. 2.7 Residual network associated with the flow of Fig. 2.6 


The last is built as follows: an arc (i, j) with capacity w(i, j) and with a flow xj; 
passing through is replaced by two arcs, one from the vertex i to j with capacity 
w(i, j) — xij (only if this value is strictly positive) and the other one from j to i 
with capacity x;;. Figure 2.7 illustrates this principle. 

Once a flow is found in the residual network, it is superimposed on the flow 
obtained previously. This is shown in Fig. 2.8. 

The complexity of this algorithm depends on the network size. Indeed, we have 
to seek a path from s to ¢ for each increasing flow. It also depends on the number 
of augmenting paths. Unluckily, the increase can be marginal in the worst case. 
For networks with integer capacities, the increase can be only 1. If the maximum 
capacity of an edge is m, the complexity of the algorithm is in O (m - (|E| + |V])). 
If m is small, for example, if the capacity of all the arcs is 1, the Ford and Fulkerson 
algorithm is fast. 
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Fig. 2.8 The flow found in the residual network (Fig. 2.7) is superimposed on the previous flow 
(Fig. 2.6). The subset A C V cuts s from f and the sum of the capacities of all the arcs going out 
of A is equal to the value of the flow. This proves the optimality of the flow 


We will see in Sect. 2.8.1 how to solve the edge coloring problem of a bipartite 
graph by solving maximum flow problems in a network where all the capacities are 
1. Its complexity can be significantly diminished by using a breadth-first search as 
a sub-algorithm to discover a path from s to t in the residual network. Hence, the 
flow is increased along the shortest path at each step. This improvement has been 
proposed by Edmonds and Karp. 

Since the number of arcs of the path cannot decrease from one step to the next, 
no more than | E | steps are performed with a given number of arcs. Since the number 
of arcs of a path is between 1 and |V|, we deduce that the complexity can be 
reduced to O (|V|| E|?). In the case of a dense network (with |E| € O(|V?|))), the 
complexity simplifies to O (| V |^). Many algorithms have been proposed for solving 
the maximum flow problem. For general networks, the algorithmic complexity has 
been recently reduced to O(|V|| EJ). 

For many applications, each unit of flow in arc (i, j) costs c(i, j). We, therefore, 
consider a network R = (V, E, w, c), where w(i, j) is the capacity of the arc (i, j) 
and c(i, j) the cost of a unit flow through this arc. Then arises the problem of the 
maximum flow at minimum cost. This problem can be solved with Algorithm 2.6 
of Busacker and Gowen, provided the network does not contain a negative cost 
circuit. 


Algorithm 2.6: (Busacker and Gowen) Maximum flow from s to t with 
minimum cost 
Input: Oriented network R = (V, E,w,c) without negative circuit, a source-node s and a 
sink-node f 
Result: Maximum flow with minimum cost from s to t 
1 Start with a null flow in all the arcs 


2 repeat 

3 Build the residual network R* relative to the current flow 

4 if A path form s to t exists in R* then 

5 Find the maximal possible flow through the shortest path from s to t in R* 
6 | Superimpose this flow on the current flow 


7 until there is no path from s to t in R* 


2.5 Assignment Problems 49 


As noted for the algorithms of Prim and Dijkstra, there is a very slight difference 
between Algorithms 2.5 and 2.6. Once more, we do not alter a winning formula! 
When constructing the residual network, the costs should be taken into account. If 
there is a flow x;; > 0 through the arc (i, j), then the residual network includes 
an arc (i, j) with capacity w(i, j) — xi; (provided this capacity is positive) with an 
unchanged cost c(i, j) and a reversed arc (j, i) with capacity xj; and cost —c(, j). 

In the general case, finding the maximum flow with minimum cost is NP- 
hard. Indeed, the TSP can be polynomially transformed into this problem. The 
transformation is similar to that of the shortest elementary path (see Fig. 2.2). 

The algorithms for finding the optimal flows presented above can solve many 
problems directly related to flow management, like electric power distribution or 
transportation problems. However, they are chiefly exploited for solving assignment 
problems (see next Chapter for modeling the linear assignment as a flow problem). 


2.5 Assignment Problems 


Assignment or matching problems occur frequently in practice. This is to match the 
elements of two different sets like teachers to classes, symbols to keyboard keys, 
and tasks to employees. 


2.5.1 Linear Assignment 


The linear assignment problem can be formalized as follows. Given an n x n matrix 
of costs C = (cj,) each element i € 7 must be assigned to an element u € U (i, u = 
1,...,n) in such a way that the sum of costs (2.33) is minimized. This problem can 
be modeled by an integer linear program: 


n n 
Minimize ` Y ciuxin (2.33) 
i=] u=1 
Subject to 
n 
5 maed heat (2.34) 
i=l 
n 
* wab ialen (2.35) 
u=] 
Xiu € {0,1} (Gu L...,n) (2.36) 


Constraints (2.34) ensure to assign exactly one element of U to each element 
of I. Constraints (2.35) ensure to assign exactly one element of J to each element 
of U. Hence, these two sets of constraints ensure a perfect matching between the 
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elements of J and U. The integrality constraint (2.36) prevents elements of J to 
share fractions of elements of U. 

A more concise formulation of the linear assignment problem is to find a 
permutation p of the n elements of the set U which minimizes ) 7.., cip,.. The value 
pi is the element of U assigned to i. 


2.5.2 Generalized Assignment 


In some cases, it is not necessary to have a perfect matching. This is particularly the 
case if the size of the sets J and U differ. To fix the ideas, let J be a set of n tasks to 
be performed by a set U of m employees, with m < n. If employee u performs task 
i, the cost is c;, and the employee needs a time of w;j,, to perform this task. Each 
employee u has a time budget limited by f,,. 

This problem, called the generalized assignment problem, occurs in various 
practical situations. For instance, it is closely related to the distribution of the loads 
between vehicles for the vehicle routing problems presented in Sect. 2.2.3. The 
generalized assignment problem can be modeled by the integer linear program: 


n m 
Minimize ` Y CiuXiu (2.37) 
i=l u=1 
Subject to 
n 
\ WiuXiu Stu u=1,...,n (2.38) 
i-l 
m 
YJ n=l died, sun (2.39) 
n=l 
Xiu € {0,1} (Gu L...,n) (2.40) 


This small modification of the assignment problem makes it NP-hard. 


2.5.3 Knapsack 


A special case of the generalized assignment problem (see Exercise 2.9) is the 
knapsack problem. It is certainly the simplest NP-hard problem to formulate in 
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terms of integer linear programming: 


n 
Maximize » Ci- Xi (2.41) 
i=l 
Subject to: 
n 
years V (2.42) 
i=1 
x; € {0,1} @=1,...,n) (2.43) 
Items of volume v; and value c;, (i = 1,...,7) can be put into a knapsack of 


volume V. The volume of the items put in the knapsack cannot be larger than V. 
The value of the selected items must be maximized. 

This problem is used in this book to illustrate the working principles of a few 
methods. The reader interested in knapsack problems and extensions like bin- 
packing, subset-sum, and generalized assignment can refer to [4]. 


2.5.4 Quadratic Assignment 


There is another assignment problem where the elements of the set Z have 
interactions with each other. An assignment chosen for an element i € J has 
repercussions for the set of all the elements of Z. Let us take the example of 
assigning n offices to a set of n employees. 

In the linear assignment problem, the c;,, values only measure the interest for the 
employee i to be assigned the office u. Assigning the office u to the employee i has 
no other consequence than the office u is no longer available for another employee. 
In practice, employees are required to collaborate, which causes them to have to 
move from one office to another. Let aj; be the frequency the employee i meets the 
employee j. Let buv the travel time from office u to office v. If we assign the office 
v to the employee j and the office u to the employee i, the last loses a time given 
by aij - buv for traveling, on average. Minimizing the total time lost can be modeled 
by the following quadratic 0-1 program, where the variable x;, takes the value | if 
the employee i occupies the office u and the value 0 otherwise: 


n n 


n n 
Minimize `Y Y aijbuvxinxjv (2.44) 


i=l j=lu=1 v=1 


Subject to 


n 
Jo wed BEd (2.45) 
i=l 
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n 
dos =1 i-l.eagn (2.46) 
u=1 

xu € {0,1} Guo... n) (2.47) 


This formulation brings out the quadratic side of the objective due to the product 
of the variables xj, - xjy. Constraints (2.45)-(2.47) are typical for assignment 
problems. So, this problem is called the quadratic assignment problem. A more 
concise model is searching for a permutation p that minimizes 


n n 


»- Se aij : bpip; 


EM ES 


Many practical applications can be formulated as a quadratic assignment problem 


(QAP): 


Allocation of offices to employees This is the example just cited formerly. 

Allocation of blocks in an FPGA A Field Programmable Gate Array requires 
connecting logic blocks on a silicon chip. These blocks allow implementing logic 
equations, multiplexers, or memory elements. Configuring an FPGA starts by 
establishing the way the modules must be connected. This can be described by 
means of a routing matrix A = (aij) which gives the number of connections 
between modules i and j. Next, each module i must be assigned a logic block p; 
on the chip. Since the signal propagation delay depends on the length of the links, 
the assignment must be carefully performed. Therefore, knowing the length buv 
of the link between logic blocks u and v, the problem of minimizing the sum of 
the propagation times is a quadratic assignment problem. 

Configuring a keypad To enter text on a cellular phone keypad, the 26 letters of 


the alphabet, as well as space, have been assigned to the keys 0,2,3,...,9. 
As standard, these 27 signs are distributed according to the configuration of 
Fig. 2.9a. 


Assume that typing a key takes one unit of time, moving from one key to another 
takes two units of time, and finally that we have to wait 6 units of time before we 
can start typing a new symbol positioned on the same key. Then it takes 70 units 
of time to type the text “a ce soir bisous." 

Indeed, it takes 1 unit to enter the “a” on key 2, then moving to key 0 takes two 
units, then 1 unit to press once for space, then 2 units to move to key 2 again and 3 
units for seizing “c,” etc. With the optimized keyboard (for the French language) 
given in Fig. 2.9, it takes only 51 units of time, almost a third less. This optimized 
keyboard was obtained by solving a quadratic assignment problem for which the 
aij coefficients represent the frequency of occurrence of the symbol j after the 
symbol i in a typical text and buy represents the time between the typing of a 
symbol placed in position u and another in position v. 
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Fig. 2.9 Standard cellular phone keyboard and keyboard optimized for the French language. (a) 
Standard keyboard. (b) Optimized keyboard 


The quadratic assignment problem is NP-hard. In practice, this is one of the most 
difficult of this class. Yet, examples of problems of size n — 30 are not optimally 
solved. Many NP-hard problems can be transformed into quadratic assignment 
problems. Without being exhaustive, let us mention the traveling salesman, the 
linear ordering, the graph bipartition or the stable set problems. Naturally, modeling 
one of these problems under the form of a quadratic assignment is undoubtedly not 
leading to the most efficient solving method! 


2.6 Stable Set 


Finding the largest independent set—maximal stable set—is a classical graph theory 
problem. This problem is NP-difficult. Section 1.2.3.4 presents a polynomial trans- 
formation of satisfiability into stable set. The latter is equivalent to finding the largest 
subset of mutually adjacent nodes—a maximum clique—in the complementary 
graph. A variant of the maximum stable set is the maximum weight stable set, when 
weights are associated with vertices. In this case, we are looking for a subset of 
independent vertices whose sum of the weights is as high as possible. Naturally, if 
the weights are all the same, this variant is equivalent to the maximum stable set. 

This problem appears in several practical applications: map labeling, berth 
allocation to ships or assigning flight level to aircrafts. This is discussed in 
Sect. 3.3.3. 


2.7 Clustering 


Like graph theory, clustering is a very useful modeling tool. There is a myriad of 
applications of clustering. Let us quote social network analysis, medical imaging, 
market segmentation, anomaly detection, and data compression. Clustering or 
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Fig. 2.10 Compression by vector quantization. An image compression technique creates cluster- 
ing instances with millions of items and thousands of clusters. Here, the initial image was divided 
into blocks of b = 3 x 5 pixels. We next looked for a palette of 2* "colors" seen as vectors of 
length b x 3, each pixel being characterized by its red, green, and blue brightness. For this image, 
we chose k = 14. Two of the 2!4 = 16,384 “colors” are shown. The palette was found with a 
clustering method, each color being a centroid. Each block of the initial image is replaced by the 
most similar centroid. As a block of the compressed image can be represented by k bits, k/b bits 
are enough to encode one pixel 


unsupervised classification consists in grouping items that are similar and separating 
those that are not. There are specific algorithms to perform these tasks automatically. 
Figure 2.10 gives an example of a large clustering instance where a decomposition 
method, such as those presented in Sect. 6.4.2 is required. Image compression by 
vector quantization involves dealing with instances with millions of elements and 
thousands of clusters. 

The supervised classification considers labeled items. It is frequently used in 
artificial neural networks. These techniques are outside the scope of this book, as 
are phylogenetic trees, popularized by Darwin in the nineteenth century. 

Creating clusters supposes we can quantify the dissimilarity d (i, j) > 0 between 
two elements i and j belonging to the set E we are trying to classify. Often, 
the function d(i, j) is a distance, (with symmetry: d(i, j) = d(j,i), separation: 
d(i, j) =0 <> i = j and triangular inequality: d (i, k) < d(i, j) + d(j, k)), but 
not necessarily. However, to guarantee the stability of the algorithms, let us suppose 
that d(i, j) > 0 and d(i, i) = 0, Vi, j € E. As soon as we have such a function, 
the homogeneity of a group G C E can be measured. Several definitions have been 
proposed. Figure 2.11 shows some dissimilarity measures for a group of 3 elements. 


Diameter of a group Maximum value of the function d (i, j) for two entities i and 
j belonging to G: maxi, jecd (i, j). 

Star Sum of the dissimilarities between the most representative element of G and 
the others: min; » ;eg d(i, j). When this element j must be in G, j is called 
a medoid. For instance, if the elements are characterized by two numeric values 
and G = {(0, 1), (3, 3), (5, 0)}, the point j of R? minimizing the sum of the 
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Gravity center (l2 star sum); 


Median of measures 


(5,0) 


Fig. 2.11 Optimal points for various homogeneity measures of a group with 3 elements in R? 


dissimilarities is (5/2 + 1/4/12, 1/2 + 5/4/12) with norm /; whereas it is the 
point (8/3, 4/3) if we consider the standard /? norm. In general, there is no 
analytical formula to find the central point with the /; norm (it can be numerically 
estimated). For the /? norm, the best point is the center of gravity or centroid 
(mean measurements on each coordinate). The medoid of G is (3, 3). 

Radius | Maximum dissimilarity between the most representative element j and 
another of G: minj;maxiegd(i, j). This element is not necessarily part of G. 
For instance, we can take the median (on each characteristic) or the point which 
minimizes any dissimilarity function. Using the numerical example above, the 
median of the measures is (3, 1), which does not belong to G. By taking the 
ordinary distance (/; norm) or squared distance (l2 norm) as a dissimilarity 
measure, the point (5/2, 1/2) minimizes the radius of G. 

Clique Sum of dissimilarities between all pairs of elements of G: ? ;.5 ~ jeG 
d(i, j). 


Several definitions have been proposed to measure heterogeneity existing 
between two groups G and H: 


Separation Minimum distance between two elements belonging to different 
groups: minieg, jeudi, j). 

Cut | Sum of dissimilarities between elements of two different groups: 
Piec Z jeu dC. j). 


Normalized cut Average of the dissimilarities between elements of two different 
groups: Y'jeg Z jeg Ai, )/(GI - 1H). 
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Once a criterion of homogeneity or heterogeneity has been defined, we can 
formulate the problem of classification into p groups G1, ... G p by an optimization 
problem using a global objective: 


* Maximize the smallest separation (or the smallest cut) between elements of 
different groups; 

* Minimize the largest diameter (or the largest radius, or the largest clique or even 
the largest star) of a group; 

* Minimize the sum of the stars (or the sum of the diameters, radius, clique). 


2.7.1 k-Medoids or p-Median 


The k-medoids problem is one of the best known in unsupervised classification. 
Frequently, the terms p-median or k-medians are used instead of k-medoids in 
location theory and statistics. Using the definitions presented above, it is about 
minimizing the sum of the stars. In other words, we have to find the k elements 
Cl, ...,Ck Of E minimizing: $5, » jeg min,—1,... kd (i, cr). This problem is NP- 
hard. 

A well-known heuristic algorithm is the Partition Around Medoids (PAM 2.7). 
This algorithm is a local search improvement method. Various authors have 
proposed variations of this heuristic—while calling it PAM, which causes some 
confusion. The method originally proposed [3] is a local search with best improve- 
ment policy (see Sect. 5.1.2). This method requires an initial position of the centers. 
Different authors have suggested various methods to build an initial solution. 
Generally, greedy algorithms are used (see Sect. 4.3). Algorithm 2.7 does not specify 
how the latter is obtained; it is simply assumed that an initial solution is provided. 
A random solution perfectly works. 


ter 


Algorithm 2.7: (PAM) Local search for clustering around medoids 


Input: Set E of items with a dissimilarity function d(i, j) between items i and j; k 


medoids c;,...,c, € E 
Result: Clusters Gj,...,G, C E 
1 repeat 
2 forall item i € E do 
3 Assign i to the closest medoid, creating clusters G1, ..., Gg C E 
4 forall medoid c; do 
forall item i € E do 
6 Compute the improvement (or the lost) of a solution where c; is moved on 
item i 
7 if A strictly positive improvement is found then 
8 Move the medoid on the item inducing the largest improvement 


» until no strict improvement is found 
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The PAM algorithm has complexity in 2 (k - n). The computation of the cost of 
the new configuration on Line 6 of the algorithm requires an effort proportional to n. 
Indeed, it is necessary to check, for each element not associated with c;, if the new 
medoid i is closer to the current one. For the elements previously associated with the 
medoid c;, the best medoid is either the second best of the previous configuration, 
which can be pre-calculated and stored in Line 3, or the new medoid i tried. 

The number of repetitions of the loop ending in Line 9 is difficult to assess. 
However, we observe a relatively low number in practice, depending on k more or 
less linearly (there is a high probability that each center will be repositioned once) 
and a sub-linear growth with n. If we want a number of clusters k proportional to 
n (for instance, if we want to decompose the set E into clusters comprising a fixed 
number of elements, on average), the complexity of Algorithm 2.7 is higher than n^. 
Thus, the algorithm is unusable as soon as the number of elements exceeds a few 
thousand. 


2.7.2 k-Means 


In case the items are vectors of real numbers and the measurement of the 
dissimilarity corresponds to the square of the distance (l2 norm), the point u that 
minimizes the homogeneity of the star criterion associated with a group G is the 
arithmetic average of elements of G (the center of gravity). The k-means heuristic 
Algorithm 2.8 is probably the best known algorithm for clustering. 


Algorithm 2.8: (k-means) Local search improvement method for clustering 
items of Rf into k groups. The dissimilarity measure is the / norm 


Input: Set E of items in R? with /? norm measuring the dissimilarity between items; k 
centers c4,...,cy € Rd 
Result: Clusters Gj,...,G, C E 
1 repeat 
2 forall item i € E do 
3 Assign each item i € E to its nearest center, creating clusters G4, ..., Gg C E 


4 forall j € 1,..., k do 


| cj gravity center of Gj 


6 until no center has moved 


Similar to the PAM algorithm, this is a local search improvement method. It 
starts with centers already placed. Frequently, the centers are randomly positioned. 
It alternates an assignment step of the item to their nearest center (Line 3) and 
an optimal repositioning step of the centers (Line 5). The algorithm stops when 
all items are optimally assigned and all centers optimally positioned considering 
their assigned items. This algorithm is relatively fast, in 2(k - n). Unluckily, it is 
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extremely sensitive to the initial center positions as well as isolated items. If the item 
dispersion is high, another dissimilarity measure should be used. For instance, the 
centers can be optimally repositioned considering the ordinary distance (/; norm) 
at Line 5. This variant is the Weber’s problem. By replacing the center c; on the 
medoid item of group G j at Line 5, a faster variant of Algorithm 2.7 is obtained. 


2.8 Graph Coloring 


Coloring the edges or the vertices of a graph allows us to mentally represent 
problems where incompatible items must be separated. Two compatible items 
can receive the same “color” while they must be colored differently if they are 
incompatible. Therefore, a color represents a class of compatible elements. In the 
edge coloring, two edges having a common incident vertex must receive different 
colors. In the vertex coloring, two adjacent vertices must receive different colors. 

The edge coloring can be transformed into a vertex coloring in the line graph. 
Building the line graph L(G) from the graph G is illustrated in Fig. 2.12. 

The vertex coloring problem is to find the chromatic index of the graph, that is 
to say, to minimize the number of colors of a feasible coloring. This problem is 
NP-hard in the general case. However, the edge coloring of a bipartite graph can be 
solved in polynomial time. 


2.8.1 Edge Coloring of a Bipartite Graph 


It is clear that the vertices of a bipartite graph can be colored with two colors. It is 
a bit more complicated to color the edges of a bipartite graph. But we can find an 


Fig. 2.12 Proper edge coloring of a graph corresponding to the proper vertex coloring of its line 
graph 
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Fig. 2.13 Maximum load problem in a road network 


optimal coloring in polynomial time. For this, we begin by completing the bipartite 
graph G = (V = X UY, E) by adding vertices to the smallest subset X or Y so that 
they contain the same number of vertices. While maintaining it bipartite, edges are 
added to the graph so that all its vertices have the same degree. This degree equals 
the largest of a vertex of G. 

Let us call G' the bipartite graph so obtained. A perfect matching can be found in 
G' by solving a maximum flow problem (see Sect. 2.5). The edges of this matching 
can use color number 1. Then, the edges of this matching are removed from G’ to 
obtain the graph G". The last has the same properties as G’: it is bipartite, both 
subsets containing the same number of vertices and all their degree being the same. 
So, a perfect matching can be found in G", the edges of this matching receiving 
the color number 2. The process is iterated until no edge remains in the graph. The 
coloring so obtained is optimal for G (and for G^) because the number of colors 
used is equals to the vertex of G with the highest degree. See also Problem 3.3. 


Problems 


2.1 Connecting Points 

A set V of points on the Euclidean plane must be connected. How to proceed to 
minimize the total length of the connections? Application: consider the 3 points 
(0, 0), (30, 57), and (66, 0). 


2.2 Accessibility by Lorries 
In the road network of Fig. 2.13, the maximum load (in tons) is given for each edge. 
What is the weight of the heaviest lorry that can travel from A to B? 


2.3 Network Reliability 
Figure 2.14 gives a communication network where connections are subject to 
breakdowns. The reliability of the connections is given for each edge. How should 
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Fig. 2.14 Reliability in a 
network 


we transmit a message from the vertex s to all the others with the highest possible 
reliability? 


2.4 Ford and Fulkerson Algorithm Degeneracy 
Show that the Ford and Fulkerson algorithm for finding a maximum flow is not 
polynomial. 


2.5 TSP Permutation Model 
Model the TSP under the form of finding an optimal permutation. 


2.6 PAM and k-Means Implementation 

Implement Algorithms 2.7 and 2.8 by initializing the k medoids or the k centers with 
the first k items. Investigate both methods on randomly generated problems in the 
unit square with n = 100, 200, 400, 1000, 2000 items and k = 5, 10, 20, 50, n/20 
centers. Estimate the empirical complexity of the algorithms. Compare the quality 
of the solutions obtained by Algorithm 2.8 when the k centers are initially placed 
on the medoids found by Algorithm 2.7 rather than randomly choosing them (with 
the k first items). 


2.7 Optimality Criterion 
Prove that the schedule given at the bottom of Fig. 2.4 is optimal. 


2.8 Flowshop Makespan Evaluation 

Knowing the processing time fj; of the objecti on the machine j, how to evaluate the 
earliest ending time f;; and the latest starting time d;; for a permutation flowshop? 
The jobs are processed in an order given by the permutation p. 


2.9 Transforming the Knapsack Problem into the Generalized Assignment 
Knowing that the knapsack problem is NP-hard, show that the generalized assign- 
ment problem is also NP-hard. 
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Chapter 3 A 
Problem Modeling Chente; 


In all fields, it is essential to choose a good model for the problem to be addressed. 
Indeed, the best solution method will be useless if it is given inappropriate data or 
constraints. Let us illustrate this on the Steiner tree problem. Two simple modelings 
may naturally be imagined: 


* Steiner nodes to retain A solution can be represented by the Steiner node to 
belong to the tree; knowing these nodes, the tree is constructed by the application 
of the Prim or the Kruskal algorithm. 

* Edges to retain A solution can equally be represented by a set of edges; these 
edges must produce a connected graph containing all terminals. 


It is not possible to determine a priori which model is the best. It really depends 
on the type of algorithm that will be developed to solve the problem. For example, 
the first model might be better suited to a constructive algorithm, while the second 
might be better suited to a local search. 

The first part of this chapter gives various modeling examples for the graph 
coloring problem. It presents some techniques to transform the objective and the 
constraints of an optimization problem in order to obtain a model facilitating the 
design of solving algorithms. 

Sometimes, there is not just one clear objective to optimize, but several. The next 
part of this chapter introduces some concepts of multi-objective optimization. 

When faced with a new problem, it is not necessarily obvious how to find 
a good model. Sometimes, the “new” problem may even be a classic one that 
has not been recognized. The last part of this chapter gives some examples of 
practical applications that can be modeled as combinatorial problems presented in 
the previous chapter. 
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3.1 Objective Function and Fitness Function 


To ensure the problem has been understood, it is necessary to formally express 
or model its core. A problem can be modeled in various ways. Next, the solving 
methods must be adapted to the chosen model. This section presents various models 
for the vertex coloring problem. These models are illustrated with a small graph. 
Figure 3.1 gives an optimal coloring of this graph. 

In Sect. 1.1, we have seen that this problem could be modeled by a satisfiability 
problem. If a coloring with a minimum number of colors is wanted, a series of 
satisfiability problems can be solved. Unless working on small graphs and having an 
efficient satisfiability solver, this approach is hardly practicable. Another modeling, 
presented in the same section, consists in formulating an integer linear program. The 
objective (1.4) is to directly minimize the number of colors used. In general terms, 
a combinatorial optimization problem can be formulated as: 


Optmize:  f(s) (3.1) 
Subjectto: ses (3.2) 


The correspondence between this general model and the linear program pre- 
sented in Sect. 1.1.1 is as follows: the objective (3.1) is to minimize (1.4), which 
is equivalent to minimizing the highest color index. Constraint (3.2) summarizes 
Constraints (1.5), (1.6), (1.7), and (1.8). 

The graph coloring problem can be expressed in a less formal way as: 


Minimize: c= fi(s) (3.3) 
Subject to: Two adjacent vertices have different colors (3.4) 
and: Number of colors used by s < c (3.5) 

© 


Fig. 3.1 Coloring the vertices of a graph with a minimum of color. This coloring is the best 
possible since the graph contains a clique of four vertices 
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Here, function fj returns the number of the highest color used. Figure 3.2 
provides a feasible coloring of a graph that is not optimal. For the solution given 
in this figure, we have f| = 5. For those given in Fig. 3.1, we have f, = 4. 

Another graph coloring model is less intuitive. It consists in directing the edges 
of the graph without creating a circuit. The objective is to minimize the length of 
the longest path in the directed graph: 


Minimize : f2(s) = Longest path in G oriented (3.6) 


Subject to: The edge orienting of G has no circuit 


Once such a directed graph is obtained, a feasible solution can be easily found. 
The vertices without predecessor receive the color number 1. They cannot be 
connected by an arc, so there is no constraint violation. 

These vertices can be removed from the graph before assigning the color 
number 2 to those staying without predecessor and so on. Hence, the number of 
colors obtained is one more than the length of the longest path. The coloring of a 
graph with this model is illustrated in Fig. 3.3. 


Fig. 3.2 Graph colored with too many colors. A number of changes are required to remove the 
unnecessary color 


< < 


Fig. 3.3 Graph coloring obtained by directing the edges without circuit. One of the longest paths 
is indicated in bold. For this solution, f? = 4 corresponds to a 5-coloring 
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Minimizing a maximum—or maximizing a minimum—as (3.3) or (3.6) is 
generally not a good way for easily discovering or improving solutions satisfying 
all constraints. In the context of local search, such a modeling contains very large 
plateaus (see Sect. 5.1.3, Fig. 5.2). 

What if a solution that has been discovered uses one color more than the 
optimum, as shown in Fig. 3.2? Is there just a vertex that uses the extra color, or 
are there many? The objective function is often replaced by a fitness function, easier 
to optimize, for instance, by including constraints that have been relaxed. 


3.1.1 Lagrangian Relaxation 


A problem that can be put in the form: 


Minimize f(s) 
Subject to: se S Easy constraint 


and: g(s) < 0 Make the problem hard 
could be modeled : 


Minimize f(s) + 4-max(g(s), 0) with Abeing a parameter 


Subject to: ses 


If à is large enough, both models have the same optimal solution. 


3.1.1.1 Lagrangian Relaxation for the Vertex Coloring Problem 


For the vertex coloring problem, Constraint (3.4) can be relaxed. A Lagrangian 
relaxation of the problem is: 


Minimize: /f3(s)— c4- 4- Number of violations of (3.4) (3.7) 
Subject to: (3.5) 


For a sufficiently large A value (for instance, by setting A = chromatic number), a 
solution optimizing (3.7) is also optimal for (3.3). In that manner, a triangle colored 
with a single color has a fitness of 1 + 3A. The fitness is 2 + à with two colors, and 
the optimal coloring has a fitness of 3. For 0 < à < 1/2, the optimum solution of 
J has one color; for 1/2 < à < 1, it has two colors; and for A > 1, the optimum 
is a feasible solution with three colors. For instance, the solution of Fig. 3.1 has a 
fitness of f3 = 4, that of Fig. 3.2 is f3 = 5, and that of Fig. 3.4 is f3 =4+ 24. 
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Fig. 3.4 Unfeasible coloring of a graph with a given number of colors. For this solution, we have 
fs = 4 + 24. For those of Fig. 3.1, we have f3 = 4. For those of Fig. 3.2, we have f3 — 5 


Fig. 3.5 Partial coloring of a graph with a given number of colors. For this solution, we have 


fs =1 


Instead of relaxing the constraint that no two adjacent vertices receive the same 
color, we can relax the constraint that each vertex be colored: 


Minimize :  f4(s) = Number of uncolored vertices (3.8) 


Subject to: : (3.4) 


A partial coloring of a graph with f4 = 1 is given in Fig. 3.5. 

Generally, the value of the multiplier A associated with a relaxed constraint 
placed in the fitness function is modulated according to the success of the search: 
if a feasible solution is discovered, the value of à is diminished. Conversely, if all 
generated solutions are unfeasible, the value of À is increased. Let us illustrate this 
for the TSP. 
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3.1.1.2 Lagrangian Relaxation for the TSP 


A I-tree of minimum weight in a network with vertices 1,2, ..., is a minimum 
spanning tree on the vertices 2,..., plus two edges adjacent to vertex 1 with the 
lowest weights. Node 1 carries out a particular role, hence the term /-tree. We can 
reformulate the TSP by specifying that one seeks a 1-tree of minimum weight with 
the constraint imposing a degree of two for every vertex. 


minz = 3. peH dijXij 
Subject to: Vint ae =2 (=1,...,n) (3.9) 


and H isa 1-tree 


By relaxing Constraints (3.9) on the degree of each vertex and including them 
with Lagrange multipliers in the objective, we get: 


min z(A) — va eH dijxij +F Xr AQ xij — 2) (3.10) 
Subject to: H is a 1-tree l 


For fixed à; (i = 1...n), the problem reduces to the construction of a 1-tree 
of minimum weight in a network where the weight of the edge (i, j) is shifted to 
dij +4; 4- 4j. With these modified weights, the length of a tour is the same as with 
unmodified weights, but increased by 2 - » "; 4;. Indeed, it is necessary to enter once 
in each vertex i and come out once, having to “pay” a A; penalty twice. Therefore, 
z(A) provides a lower bound to the length of the optimal tour. The value of this 
bound can be improved by finding A; maximizing z(A). 

Figure 3.6 illustrates various 1-tree that can be obtained by modifying the X 
values for a small TSP instance. 
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Fig. 3.6 Left: 1-tree on the TSP instance tsp225 obtained with all à values set to 0. The size of 
filled disks is proportional to the value to add to the X associated with penalized vertices, and circles 
are proportional to the value to remove. Top right: the 1-tree obtained with the first modification of 
the A. Bottom right: the 1-tree obtained after iterating the process. Only 12 vertices have a degree 
different from 2, and the length of this 1-tree is about 7.596 above the length of the initial 1-tree 
and 1.1% below the optimum tour length 
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3.1.2 Hierarchical Objectives 


Rather than introducing several constraints with penalties in the fitness function, 
another possibility is to consider hierarchical objectives. For the graph coloring 
problem, a primary objective counts the number of colors used. The value of 
this objective is chosen and fixed. A secondary objective measures the number 
of constraints violation. Once a feasible solution is found, one can try reducing 
the number of colors before starting again. If no feasible solution is achieved, the 
number of colors of the primary objective is increased. Proceeding like this allows 
not completely losing the work done so far. 

The modeling choice has a significant influence on the problem-solving capacity 
of a given method. Therefore, it is worth paying close attention to the problem 
analysis phase. For instance, if a timetable problem is modeled by a graph to color, 
the user might not necessarily be interested in the timetable using the smallest 
possible time slots. This last objective is minimized by solving a standard graph 
coloring problem. The user might just specify the maximum number of time slots 
available. Then any timetable using no more than these time slots could be fine. In 
this case, a fitness function of type f3 (with A close to 0 and c fixed to the maximum 
number of time slots desired) or f4 would certainly be more convenient than fı or 
fas 

Relaxing constraints by including a penalty for their violations in the fitness 
function can only be considered for a relatively small number of constraints. 
However, proceeding like this is very common when we are dealing with "soft" 
constraints. The last corresponds rather to preferences than to strict constraints. In 
this case, using a model with hierarchical objectives may be a good option. 

A more flexible and frequently used approach is to consider several objectives 
simultaneously and to leave the user the choice of a compromise— prefer a solution 
good for an objective rather than another. Such an approach implies the methods to 
be able to propose various solutions rather than a single one optimizing a unique 
objective. 


3.2 Multi-Objective Optimization 


Route planning is a typical multi-objective optimization example. To go from a 
point A to a point B, we have to use a given transportation network. This constitutes 
the constraints of the problem. We want to minimize the travel time of the journey, 
minimize the energy consumption, and maximize the pleasure of the journey. 

These objectives are generally antagonists. The same route can be done by 
reducing the energy consumption at the cost of an increased travel time. The 
user will choose an effective route on a more subjective basis: for instance, by 
"slightly" increasing duration, the energy consumption "sensibly" decreases, and 
the landscape is "picturesque." 
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Train station 


110/50 Airport 1 


Fig. 3.7 Example of a multi-objective problem: To travel from a departure place to a destination, 
several routes can be selected. Each trip has a given cost and duration, indicated next to the arcs. 
From the departure place, we can either go to the train station by bus, or by taxi, or go directly to 
the nearest airport by taxi. From the train station, we can reach either the nearest airport or another 
airport. The last is a little further away but better serviced and flights are more competitive. Then 
we fly up to the airport the closest to the destination, where we can go to the final destination either 
by bus or by taxi 


Figure 3.7 illustrates the case of someone who has to travel by air and who has 
the choice between several means of transportation to get to an airport and reach the 
final destination. 

An optimization problem with K goals can formulate generally by: 


“Minimize” : fis) = (fi(s),..., fx(s)) 
Subjectto : ses 


(3.11) 


This formulation assumes that one seeks to minimize each objective. This 
does not constitute a loss of generality. Indeed, it is equivalent to maximizing or 
minimizing the opposite. It is said that a solution sı dominates a solution s2 if sı 
is better than s2 on at least one objective and at least as good as s2 on the other 
objectives. 

The purpose of multi-objective optimization is therefore to exhibit all non- 
dominated solutions. These solutions are qualified as efficient or Pareto-optimal. 
Representing a solution where each axis represents the value of each objective, 
the solutions that are on the convex envelope of the Pareto frontier are called the 
supported solutions. 

Figure 3.8 provides all the solutions to the problem instances given in Fig. 3.7 
in a cost/time diagram. We see in this figure that there are efficient solutions not 
located on the convex hull. 

The ultimate choice of a solution to the problem is left to the decision- 
maker. This choice is subjective, based, for instance, on political, ethical, or other 
considerations that cannot be reasonably quantified and therefore introduced neither 
in the objectives nor in the constraints. 
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Fig. 3.8 Representation of the solutions of the problem of Fig. 3.7 in a cost/time diagram. The 
nine efficient solutions are highlighted. Only supported solutions may be discovered by an exact 
scalar method 


3.2.1 Scalarizing 


A technique for generating various solutions to a multi-objective problem is to 
modify the last in a single-objective problem. The parameters of this problem are 
as many as the objectives of the multi-objective problem. Thus, the multi-Objective 
Problem (3.11) turns into a single-objective problem with parameters w1, ..., wk. 


K 
Minimize : X wi - fils) (3.12) 
i=l 


Subjectto : ses 


This technique is known as linear scalarization. Let us suppose we have an exact 
method for solving Problem (3.12). The supported solutions can be discovered by 
varying the value of the weights w;. For instance, by setting w; — 1 fora given i and 
zero weights for the other objectives, the best possible solution for the ith criterion 
can be found. Once these K solutions are known, other supported solutions can be 
found, if any exists. Indeed, vectors (w;) orthogonal to the hyperplane supported by 
the K objectives of these K solutions can be considered. By reiterating the process 
with this new solution, the proper set of supported solutions can be generated. 
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3.2.2 Sub-goals to Reach 


The main issue with linear scalarization is that the unsupported efficient solutions 
are not achievable. The extreme case is that the only supported solutions are the K 
individually optimizing a single objective. Unsupported solutions can be extracted 
by fixing the minimum quality of a solution on one or more objectives. The idea is 
to include one or more constraints while removing the same number of objectives. 
If the solutions are constrained to get a value at most v, for the first objective, we 
have the problem with d — 1 objectives: 


Minimize : f(s) = (fo(s),..., fx(s)) (3.13) 
Subjectto : fi(s) € vı 


with: ses 


In the example of Fig. 3.8, imposing a maximum budget vı = 250, the solution 
of cost 248 for a time of 207 is found. The last is not supported. 


3.3 Practical Applications Modeled as Classical Problems 


Let us conclude this chapter by giving some examples of practical applications that 
can be modeled as academic problems. 


3.3.1 Traveling Salesman Problem Applications 


As we have taken the traveling salesman problem as the main thread of this book, 
we start by showing how to model in this form problems that have a priori nothing 
to do with performing tours. 


3.3.1.1 Minimizing Unproductive Moves in 3D Printing 


To produce parts in small quantities, especially to make prototypes directly with 
CAD software, additive manufacturing or 3D printing techniques are now used. One 
of these techniques is to extrude a thermoplastic filament which is deposited layer 
by layer and hardens immediately. 

Itis particularly useful to minimize the unproductive moves of the extrusion head 
when printing three-dimensional parts. To produce such a piece, the 3D model is 
sliced into layers of thickness depending on the diameter of the extrusion head. Each 
layer is then decomposed into a number of segments. If the extrusion head must start 
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LK, 


Fig. 3.9 Part of a bicycle lamp, viewed in the PrusaSlicer software 


printing a segment at a different position than those of the previous segment, it must 
perform an on-air move. The total length traveled by the head is consistent: for the 
part shown in Fig.3.9—about 10 cm wide, it represents nearly a kilometer. The 
unproductive moves can represent a relatively consequent fraction of all the moves. 
As a result, minimizing the latter allows a significant productivity gain. 

The transformation principle of this problem into the traveling salesman is the 
following: a naive attempt would create a city per segment endpoint. However, a tour 
comprising only these cities does not necessarily provide a solution to the extrusion 
problem, as a segment might not be printed when visiting one of its endpoints. 

Therefore, it is necessary to force the tour to visit the other endpoint of a segment 
immediately after visiting the first endpoint. For this purpose, a city is included in 
the middle of each segment. The standard TSP does not have constraints on the 
city visiting order. To ensure a good TSP tour corresponds to a feasible solution 
to the extrusion problem, the distance matrix must be adapted. To force printing a 
segment, the distance between the cities’ endpoints and the middle city is zero. The 
head will truly take a while to print the segment, but this time is incompressible 
since the segment must be printed anyway. 

To prevent certain moves of the head, a large distance M is associated with 
prohibited moves. In this manner, a proper tour will not include such moves of the 
head. The value of M should not be too large, to avoid numerical problems. A value 
about 100 times the size of the object to be printed may be suitable. This technique 
prevents connecting the middle city of a segment to all other cities but excepted both 
endpoints of the corresponding segment. Another constraint is to print all segments 
of a layer before printing the subsequent layer. 
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Fig. 3.10 Transformation of the problem of minimizing non-productive moves of a 3D printing 
into a TSP. Principle for assigning distances between six cities corresponding to segments [a, c] 
and [e, g] to print. The "cities" b and f are placed in the middle of the segments to force the 
extrusion head to go from one endpoint of a segment to the other one. The M value is relatively 
large (typically, 100 times the size of the object). The value of p depends on the respective layers 
of both segments: either 0 if they are in the same layer or M/10 if they are in adjacent layers or M 
if they are in different and non-adjacent layers 


Indeed, it is no longer possible to extrude material below an already printed layer. 
Moreover, this would significantly complicate the management of the extrusion 
head. The last can collide with the material of an upper layer when printing a lower 
one. This can be prevented in the traveling salesman model by a technique similar 
to that presented above. Figure 3.10 illustrates how to build the distance matrix for 
printing two segments. 

The distance between two endpoints of segments is penalized by a value p 
depending on the segment layers. If both segments belong to the same layer, the 
penalty is zero. Else, if the segments are in adjacent layers, we can set p = M/10. 
Thus, à proper tour goes only once from a layer to the next one. The length of 
a good tour corresponds to that of the unproductive moves plus M/10 times the 
number of layers to print. Otherwise, if the segments are neither in the same layer 
nor in adjacent layers, the penalty is set to p — M. Finally, two cities corresponding 
to the initial and ultimate positions of the extrusion head are added to complete the 
model. 

It should be noted that traveling salesman models for minimizing unproductive 
moves can lead to large size instances. The part illustrated in Fig.3.9 has a few 
hundred thousand segments. Figure 3.11 illustrates the productivity gain that can be 
obtained by optimizing unproductive moves. On the one hand, we have the moves 
proposed by the PrusaSlicer software for just one layer. On the other hand are 
the optimized moves obtained with a traveling salesman model. The length of the 
unproductive moves can be divided by about 9 for this layer. 
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Fig. 3.11 On the left, moves of the extrusion head, as produced by the PrusaSlicer software. 
Darker segments correspond to unproductive moves. The total length of these moves is about 
740.8 mm. On the right, optimized moves using a traveling salesman model. The length of non- 
productive moves is about 84.5 mm 


3.3.1.2 Scheduling Coloring Workshop 


One of the simplest scheduling problems is that of painting objects. The unique 
resource is the machine coloring the objects. Each task has only one operation: 
color the object i (i = 1,...,7); the duration of the operation is t;. After coloring 
the object i, the machine must be cleaned to correctly color the next, 7; this set- 
up time is s;;. Note that, generally, s;; 4 sji: indeed, dark pigments in a pastel 
color are more visible than the opposite. After coloring all the objects, the machine 
must be cleaned to be ready for the next day; the duration of this operation is r. 
The goal is to find the best coloring order of the objects. Hence, we look for a 
permutation p of the n objects minimizing boe (fp; + Sp: pict) + fp, + r. This 
scheduling problem with set-up time can be reduced to a traveling salesman instance 
with n + 1 cities. For proving this, let w;; = ti + sij(i, j = l,..., n), Wio =f, 
and wo; = 0,(i = 1,...,n). We can verify that the shortest tour on the cities 
0, ..., n provides the optimal order to color the objects. The “object” 0 represents 
the beginning of a workday. 


3.3.2 Linear Assignment Modeled by Minimum Cost Flow 


The linear assignment mathematical model given in Sect.2.5.1 is both concise 
and rigorous. However, it does not indicate how to solve a problem instance. 
Using general integer linear programming solvers might be inappropriate—unless 
the solver automatically detects assignment constraints (2.34) and (2.35) and 
incorporates an ad hoc algorithm to process them. This is frequently the case. 
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Girls 


Fig. 3.12 Matching problem: how many mixed couples can be formed between girls and boys? A 
compatible matching is represented by an edge in a bipartite graph. The problem can be modeled 
by searching for a maximum possible flow from a vertex s to a vertex t in a network 


The maximum matching problem can be treated as a special linear assignment 
problem: if i can be matched with u, then we can set a cost of O to the edge; 
otherwise, a positive cost is set. If an assignment of minimum cost is found, the last 
uses as few edges with positive cost and as many edges with 0 cost as possible. The 
maximum matching problem can be modeled with a maximum flow in a bipartite 
network. Figure 3.12 illustrates how a matching problem can be solved with a flow 
in a network. 

Similarly, the linear assignment problem can be modeled by a minimum cost 
flow. A bipartite network R = (J UU U {s} U {t}, E, w, c) is built similarly to the 
matching problem presented in Fig. 3.12. Every pair of nodes (i € J,u € U) is 
connected by an arc (i, u) with capacity w(i, u) = 1 and cost c(i, u). An arc (s, i) 
of capacity w(s, i) = 1 with cost 0 connects s to each node i € J. An arc (u, t) of 
capacity w(u, t) — 1 with cost 0 connects each node u € U to t. 

Finding an optimum cost flow in R allows finding the optimal assignment in 
polynomial time, for instance, with Algorithm 2.6. 

More efficient algorithms have been designed for the linear assignment problem. 
This leads us to make a comment on the integer linear program presented in 
Sect. 2.5.1. A peculiarity of the constraint matrix is that it contains only Os and 1s. 
It can be proved that the adjacency matrix of a bipartite graph is totally unimodular. 
This means that the determinant of any square sub-matrix is either —1 or 1 or 0. 
Hence, the integrality constraints (2.36) can be omitted. Therefore, the standard 
linear program provides an integer optimal solution. 


3.3.3 Map Labeling Modeled by Stable Set 


An application of the maximum weight stable set appears for map labeling. When 
one wishes to associate information with objects drawn on a plan, the problem is to 
choose the label position so that they do not overlap. Figure 3.13 illustrates a tiny 
problem of labeling three cities. 
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Fig. 3.13 Map labeling problem: The name of three objects must be placed on a map. The texts 
should not overlap to maintain readability. For this instance, four possibilities are considered for 
the label position of each object. In terms of a graph, this problem can be modeled by a maximum 
stable set. The vertices of the stable correspond to the chosen label positions. Here, the name of 
each city can be placed at the top, right corner 


This problem can be transformed into the maximum stable set as follows: A 
vertex is created for each potential label position. The set of vertices corresponding 
to the same label is connected in a clique. Indeed, there should be only one label per 
object. The vertices corresponding to overlapping labels are also connected with an 
edge. Hence, a stable set corresponds to label positions without overlap. To display 
as many labels as possible on the map, a maximum stable is searched in the graph. 

In practice, not all positions are equivalent. Indeed, according to the language 
and the culture, there are preferred positions. For instance, in Western countries, 
one prefers to place the names at the top, right corner rather than at the bottom, 
left one. Preferences can be modeled as weights associated with positions. The map 
labeling problem then consists in finding a maximum weight stable set. 

Other problems can be modeled in exactly the same way. First is the berth 
allocation for docking ships. Translated in terms of labeling, a rectangular label is 
associated with each ship. The label width is the expected duration of the stopover, 
and the label height is the length of the ship. The possible positions for this 
label are determined by the ship’s arrival time and by the dock locations that can 
accommodate the boat. 

Another application of this problem is the flight levels allocation for commercial 
aircraft. Knowing the expected departure times of each aircraft and the routes they 
take, the potential areas of collision between aircraft are first determined. The size 
of these zones depends on the uncertainty of the actual departure times and the 
effective routes followed. A label will therefore correspond to a zone. The possible 
positions of the labels are the various flight levels that the aircraft could use. 
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3.1 Assigning Projects to Students 

Four students (A, B, C, D) must choose among a set of four semester projects 
(1, 2, 3, 4). Each student makes a grade prediction for each project. What choices 
students should make to maximize grade point average? 


Project 
1 2 3 4 
60 50 58 5.5 
60 55 45 48 
60 54 4.0 
55 45 50 3.8 


Student 


Daw 
A 
in 


3.2 Placing Production Units 

A company has three existing production units 1, 2, and 3 and wants to open three 
new units, 4, 5, and 6. Three candidate locations a, b, and c are retained for the new 
units. Figure 3.14 illustrates the locations of the existing and new production units. 

The parts produced must be transferred from an existing unit to a new unit using 
only the connections indicated in the figure. For instance, the distance between the 
unit 1 and the location b is 3 + 4 = 7. The numbers of daily transfers between the 
existing and new units are given in Table 3.1. 

Where to place these new production units to minimize the total transfer 
distance? 


Fig. 3.14 Location of 


production units 6 (©) 
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Table 3.1 Number of daily 
transfers between existing 
and new units 


If parts are also transferred between new units, what kind of problem should we 
solve? 


3.3 Oral Examination 

Six students (A, ..., F) undergo oral examinations for different modules (1, ..., 5). 
The duration of each examination is 1 hour. How to build a timetable as short as 
possible, knowing that both students and teachers can have only one examination at 
a time? 


Student 
C D 


Module 


1 
2 
3 x 
4 
5 


3.4 Written Examination 

The following table summarizes the student enrolments for written examinations 
of various modules. Each student can undergo one examination a day at most. All 
students to pass the same module are examined the same day. How many days, at 
minimum, are needed to organize the examination session? 


Student 


D E F G H I 


Module 
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3.5 QAP with More Positions than Items 

How to adapt or change the QAP model when there are fewer elements to be placed 
(n) than positions (m)? Same question if there is a fixed cost cj, for assigning the 
item i to the position r. 


3.6 Mobile Phone Keyboard Layout 

We wish to configure the keys of an old mobile phone. We want to place only the 26 
lowercase letters as well as the space on the keys 0, 2, 3, ..., 9. Up to four symbols 
can be placed per key. 

For the language considered, g;; represents the frequency of appearance of the 
symbol j after symbol i. To enter a symbol in position p on a key, it is necessary to 
press p times the key. This requires p time units. To switch from one key to another, 
the travel time is one unit. To enter two symbols located on the same key, we have 
to wait 6 time units. How to build a numerical instance of a QAP for this problem? 


3.7 Graph Bipartition to QAP 

The bipartition problem is to separate the vertices of a graph into two subsets X 
and Y of identical size (assuming an even number of nodes) so that the number of 
edges having an end in X and the other in Y are as low as possible. How to build a 
quadratic assignment instance for the graph bipartition? 


3.8 TSP to QAP 
How to build a quadratic assignment instance corresponding to a TSP instance? 


3.9 Special Bipartition 

We consider a set of cards numbered from 1 to 50. We want to split up the cards into 
two subsets. The sum of the numbers of the first should be 1170, and the product of 
the others should be 36,000. How to code a solution attempt to this problem? How 
to assess the quality of a solution attempt? 


3.10 Magic Square 

We want to create a magic square of order n. This square has n x n cells to be filled 
with the numbers of 1 to n?. The sum of the numbers in each line, column, and 
diagonal must be (n? + n)/2. A magic square of order 4 is given below. 


34 

Z 
4 1415 1 — 34 
9 7 6 12 — 34 
5 1110 8 — 34 
16 2 3 13 — 34 


++ LN 
34343434 34 
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How to code a solution attempt to this problem? How to assess the quality of a 
solution attempt? 


3.11 Glass Plate Manufacturing 

For producing glass plates, the molten glass passes on a chain of m machines in the 
order 1, 2, ..., m. Depending on the desired features for each plate, the processing 
time on each machine differs. It is assumed there are n different plates to produce 
(in an order which can be decided by the chain manager). The processing time of 
the plate i on the machine j is ¢;;. Additionally, when a machine has completed 
the processing of a plate, the latter must immediately switch to the next machine 
without waiting time; otherwise, the glass cools down. A machine only processes 
one plate at a time. The chain manager must determine in which order to produce 
the n plates to complete the production as quickly as possible. How to model this 
no-wait permutation flowshop problem as a TSP? 


3.12 Optimal 1-Tree 
Find values A, . . . às to assign to the five nodes of the network given in Fig. 2.2 such 
that: 


: M hi =0 
* The weight of the 1-tree associated with these values is as high as possible 
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Part II 
Basic Heuristic Techniques 


This part introduces the building blocks of heuristics. First are the constructive 
methods. Then, once a solution is available, it can be improved with a local 
search. Finally, if either the problem is complex or the dataset is relatively large, 
decomposition methods can be used. 


Chapter 4 A 
Constructive Methods Geek for 


Having ascertained that the problem to be solved is intractable and that the design of 
a heuristic is justified, the next step is to imagine how to construct a solution. This 
step is directly related to the problem modeling. 


4.1 Systematic Enumeration 


When we have to discover the best possible solution for a combinatorial optimiza- 
tion problem, the first idea that comes is to try to build all the solutions to the 
problem, evaluate their feasibility and quality, and return the best that satisfies all 
constraints. Clearly, this approach can solely be applied to problems of moderate 
size. Let us examine the example of a small knapsack instance in 0-1 variables with 
two constraints: 


maxr = 9x4 + 5x2 + 7x3 + 3x4 + xs 
Subject 4x; + 3x2 + 5x3 + 2x4 + x5 < 10 
to: 4x1 + 2x2 + 3x3 + 2x4 + xs <7 
xi € {0, 1}@ = 1,...,5) 


(4.1) 


To list all the solutions of this instance, an enumeration tree is constructed. The 
first node separates the solutions for which x; = 0 of those where x; = 1. The 
second level consists of the nodes separating x? = 0 and x? = 1, etc. Potentially, this 
problem has 25 = 32 solutions, many of which are unfeasible, because of constraint 
violations. Formally, the first node generates two sub-problems that will be solved 
recursively. The first sub-problem is obtained by setting xı = 0 in (4.1): 
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maxr = 0 + 5x2 + 7x3 + 3x4 + xs 
Subject 3x2 + 5x3 + 2x4 + x5 < 10 
to: 2x2 + 3x3 + 2x4 + xs < 7 

x; € {0, I}@ =2,...,5) 


The second sub-problem is obtained by setting x; = 1 in (4.1): 


maxr = 9+ 5x2 + 7x3 + 3x4 + xs 
Subject 3x2 + 5x3 + 2x4 + x5 <6 
to: 2x2 + 3x3 + 2x4 + x5 <3 

xj € {0, G = 2,...,5) 


To avoid enumerating too many solutions, the tree can be pruned by noting 
that all branches arising from a node with a constraint violation will lead to 
unfeasible solutions. Indeed, for this problem instance, all constraint coefficients 
are non-negative. For instance, if the x1, x2, x3 variables are already fixed to 1, both 
constraints are violated, and all the sub-problems that could be created from there 
will produce unfeasible solutions. Therefore, it is useless to develop this branch by 
trying to set values of the x4 and xs variables. 

Another way to prune the non-promising branches is to estimate by a short 
computation whether a sub-problem could lead to a better solution than the best 
found so far. This is the branch and bound method. 


4.1.1 Branch and Bound 


To quickly estimate whether a sub-problem may have a solution, and if the latter 
is promising, a technique is to relax one or more constraints. The optimal solution 
of the relaxed problem is not necessarily feasible for the initial one. However, few 
interesting properties can be deduced by solving the relaxed problem: If the latter 
has no solutions or its optimal solution is worse than the best feasible solution 
already found, then there is no need to develop the branch from this sub-problem. If 
the relaxed sub-problem contains an optimal solution feasible for the initial problem, 
then developing the branch is also unnecessary. In addition to the Lagrangian 
relaxation seen above (Sect. 3.1.1), several relaxation techniques are commonly 
used to simplify a sub-problem. 


Variable integrality Imposing integer variables makes Problem (4.1) difficult. We 
can therefore remove this constraint and solve the problem: 
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max S = 9x; + 5x2 + 7x3 + 3x4 + xs 
Subject 4x; + 3x2 + 5x3 + 2x4 + x5 < 10 
to: 4x1 + 2x2 + 3x3 + 2x4 + x5 < 7 
O<x <1G@=1,...,5) 


(4.2) 


This linear problem can be solved efficiently in polynomial time. Its optimal 
solution is (0.5; 1; 1; 0; 0) with objective value of 16.5. Since it comprises a 
fractional value, this solution is not feasible for the initial problem. However, 
it informs us that there is no solution to Problem (4.1) whose value exceeds 16.5 
(or even 16 since all the coefficients are integers). Therefore, if an oracle gives 
us the feasible solution (1; 0; 1; 0; 0) of value 16, we can deduce this solution to 
be optimal for the initial problem. 

Constraint aggregation (surrogate constraint) A number of constraints are lin- 
early combined to get another one. In our simple example, we get: 


max S = 9x, + 5x2 + 7x3 + 3x4 + xs 
Subject 8x; + 5x2 + 8x3 + 4x4 + 2x5 < 17 (4.3) 
to: xi € {0, I} (i = 1,...,5) 


This problem is a standard knapsack. It is easier to solve than the initial problem. 
The solution (1; 1; 0; 1; 0) is optimal for the relaxed Problem (4.3) but is not 
feasible for the initial problem because the second constraint is violated. As the 
relaxed problem is NP-hard, this approach may be problematic. 

Combined relaxation Clearly, several types of relaxation can be combined, for 
instance, the aggregation of constraints and the integrality variables. For our 
small example, we get: 


max $ = 9x1 + 5x2 + 7x3 + 3x4 + xs 
Subject 8x1 + 5x2 + 8x3 + 4x4 + 2x5 < 17 (4.4) 
to: O<x SS Ia Shs) 


This problem can be solved in O (n logn) as follows: the variables are sorted in 
the order of decreasing r; /v; values, where r; represents the revenue of the object 
i and v; its aggregated volume. In our example, the indices are already sorted. 
The objects are selected one after the other in this order until a new object would 
overcharge the knapsack. This leads to x? = x2 = 1. The next object is split to 
completely fill the knapsack (~> x3 = 4/8 for a total value of the knapsack 
S=9+5+7-4/8 = 17,5). Since all the coefficients are integers in our 
example, S = [17,5] = 17 is also a valid bound for the optimal value of the 
initial problem. 


Algorithm 4.1 provides the general framework of the branch and bound method. 
Figure 4.1 shows a partial enumeration tree that can be obtained by solving the 
small problem instance (4.1). Three components should be specified by the user for 
implementing a complete algorithm. 
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Algorithm 4.1: Branch and bound framework for an objective to maximize. 
It is necessary to provide three methods: o for the management of the sub- 
problems to be solved (generally, a priority queue (based on a heuristic 
criterion) or a stack), a method 6 for evaluating the relaxation of the sub- 
problems, and a heuristic y for choosing the next variable to separate for 
generating new sub-problems 

Input: A problem with n variables x;,...,x,, policy a for managing sub-problems, 


relaxation method B, branching method y 
Result: An optimal solution x* of value f* 


1 f* — —oo // Value of best solution found 
2 Føð // Set of fixed variables 
3 Le {xj,.-.,%n} // Set of free variables 
4 Q- ((F,L)) // Set of sub-problems to solve 


s while Q Z Ø do 


6 Remove a problem P = (F,L) from Q according to policy a 
7 if P can potentially have feasible solutions with values already fixed in F then 
8 Compute a relaxation x of P with method B, modifying only variables x, € L 
9 if x is feasible for the initial problem and f* « f(x) then Store the improved 
solution 
10 xx 
n fra) 
2 else if f(x) > f* then Expand the branch 
13 Choose x; € L according to policy y 
14 forall possible value v of xy do 
16 L Q — QUL(FULx — LA {xe})} 
17 else No solution better than x* can be obtain 
18 Prune the branch 


First, the management policy of the set Q of sub-problems awaiting treatment 
must be specified. If Q is managed as a queue, we have a breadth-first search. If 
Q is carried as a stack, we have a depth-first search. The last promotes a rapid 
discovery of a feasible solution to the initial problem. 

A frequent choice is to manage Q as a priority queue. This implies computing 
an evaluation for each sub-problem. Ideally, this evaluation should be strongly 
correlated with the best feasible solution that can be obtained by developing the 
branch. A typical example is to use the value S of the relaxed problem. The choice of 
a management method for Q is frequently based on very empirical considerations. 

The second component to be defined by the user is the relaxation technique. This 
is undoubtedly one of the most delicate points for designing an efficient branch 
and bound. This point strongly depends on both the problem to be solved and the 
numerical data. 

The third choice left is how to separate the problem into sub-problems. A simple 
policy is to choose the smallest index variable or a non-integer variable in the 
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(1,1,0,0, 1) unfeasible 


Fig. 4.1 Solving Problem (4.1) with a branch and bound. Sub-problem set Q managed as a stack. 
The nodes are numbered by creation order. Branching is done by increased variable index. Nodes 
9 and 7 are pruned because they cannot lead to feasible solutions. Node 10 is pruned because it 
cannot lead to a solution better than node 11 


solution of the relaxed problem. Frequently, the policy adopted for branching is 
empirical. 

A simple implementation of this framework is the A* search algorithm. The last 
manages Q as a priority queue and evaluates a heuristic value before inserting a 
sub-problem in Q. 

In some cases, the number of possible values for the next x, variable to set is 
significant, especially when x; can take any integer value. A branching technique is 
to consider the fractional value y taken by a variable x, and to develop two branches, 
one with the additional constraint x, < |y] and the other with x, > |y] + 1. In this 
case, the sets of fixed and independent variables are unchanged on Line 16. This 
technique was proposed by Dakin [2]. 

Another technique, known as branch and cut is to add constraints to the relaxed 
sub-problem. The goal of the new constraints is to remove the unfeasible solution 
obtained by solving the sub-problem. For instance, such a constraint may prevent a 
variable to take a given fractional value. 


4.1.1.1 Example of Implementation of a Branch and Bound 


A naive branch and bound implementation manages the sub-problem set as a stack 
(policy o). This is performed automatically with a recursive procedure. 

For the TSP, a solution is a permutation p of the n cities. The element p; provides 
the ith city visited. Assume that the order of cities has been fixed up to and including 
i and L is the set of cities remaining to be ordered. A lower bound on the optimal 
tour length can be obtained by considering that: 


1. The ith city is connected to the closest of L. 
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2. Each city of L is connected to another of L that is the closest. 
3. The first city is connected to the closest of L. 


Doing so, a valid tour is eventually obtained for the complete problem. When 
only one "free" city remains (|Z| = 1), we have to go to this one and then to return 
to the departure city. In this situation, a valid tour is obtained. The procedure given 
by Code 4.1 returns a flag indicating whether a feasible tour is found. 


Code4.1 tsp lower bound.py Code for computing a naive lower bound to the optimal tour. The 
procedure returns the bound and can alter the order of the last cities of the tour. In the event the 
length of the modified tour is equal to the value of the lower bound, the procedure indicates that 
the tour is optimal 


1| #HHHHHHHH Computation of a naive lower bound for the TSP 
2| def tsp lower bound (d, # Distance matrix 


depth, # tour[0] to tour[depth] fixed 
4 tour): # TSP tour 
6 len (tour) 
0 #Compute the length of the path for the cities already fixed in tour 
8 for j in range (depth): 


9 lb += d[tour[jl] [tour [j*11] 


11 valid = 1. # valid is set to 1 if every closest successor of j build a tour 
12 for j in range (depth, n-1): # Add the length to the closest free city j 
13 minimum = d[tour[j]] [tour [j+1]] 
14 for k in range(n-1, depth, -1): 


15 if k I= j and minimum > d[tour[j]] [tour [k]]: 

16 minimum = d[tour[j]] [tour [k]] 

17 Te Sts uS 

18 tour [k], tour[j+1] = tour[j+1], tour[k] 


19 else: 
20 valid = 
21 lb += minimum 


2 minimum = d[tour[n-1]] [tour [0]] # Come back to first city of the tour 
24 for j in range (depth+1, n-1): 
25 if (minimum > d[tour[j]] [tour[0]]): 


26 valid = 0 

27 minimum = d[tour[j]] [tour [0]] 

28 lb += minimum 

29 return lb, tour, valid 4 Lower bound, tour modified, lb -- tour length 


To implicitly list all possible tours on n cities, an array as well as a depth 
index can be used. From the depth index, all the possible permutations of the 
last elements of the array are enumerated. This procedure is called recursively with 
depth + 1 after trying all the remaining possibilities for the depth array entry. 
To prune the enumeration, no recursive call is performed either if the lower bound 
computation provides an optimal tour or if the lower bound of the tour length is 
larger than that of a feasible tour already found. Code 4.2 implements an implicit 
enumeration for the TSP. 
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Code 4.2 tsp branch and bound.py Code for implicitly enumerating all the permutations of n 
elements 


from tsp lower bound import tsp lower bound # Listing 4.1 


3| HHHHHHHHE Basic Branch & Bound for the TSP 
# Distance matrix 
depth, # current_tour[0] to current_tour[depth] fixed 
current_tour, # Solution partially fixed 
upper_bound) : # Optimum tour length 


n = len(current_tour) 
best_tour = current_tour[:] 
for i in range(depth, n): 
tour = current_tour[:] 
tour[depth], tour[i] = tour[i], tour[depth] 
lb, tour, valid = tsp_lower_bound(d, depth, tour) 
if (upper bound > 1b): 
if (valid): 
upper bound - lb 
best tour = tour[:] 
print("Improved: ", upper bound, best tour) 
else: 
best tour, upper bound = tsp branch and bound (d, depth«1, tour, \ 
upper bound) 
return best tour, upper bound 


It should be noted here that this naive approach requires a few seconds to a few 
minutes to solve problems up to 20 cities. However, this represents a significant 
improvement over an exhaustive search, which would require a computing time 
of several millennia. The relaxation based on the notion of l-tree presented in 
Sect. 3.1.1.2 could advantageously replace that provided by Code 4.1. 

In recent years, so-called exact methods for solving integer linear programs have 
made substantial progresses. The key improvements are due to more and more 
sophisticated heuristics for computing relaxations and branching policies. Software 
like CPLEX or Gurobi include methods based on metaheuristics for computing 
bounds or obtaining good solutions. This allows a faster pruning of the enumeration 
tree. Despite this, the computational time grows exponentially with the problem 
size. 


4.20 Random Construction 


A rapid and straightforward method to obtain a solution is to generate it randomly 
among the set of all feasible solutions. We clearly cannot hope to reliably find an 
excellent solution like this. However, this method is widely implemented in iterative 
local searches repeating a constructive phase followed by an improvement phase. It 
should be noted here that the modeling of the problem plays a significant role, as 
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noted in Chap. 3. In case finding a feasible solution is difficult, one must wonder 
whether the problem modeling is adequate. 

Note that it is not necessarily easy to write a procedure generating each solution 
of a feasible set with the same probability. Exercise 4.1 deals with the generation 
of a random permutation of n items. Naive approaches such as those given by 
Algorithms 4.5 and 4.6 can lead to non-uniform solutions and/or inefficient codes. 


4.3 Greedy Construction 


In Chap.2, the first classical algorithms of graphs passed in review—Prim and 
Kruskal for building the minimum spanning tree and Dijkstra for finding the shortest 
path—were greedy algorithms. They are building a solution by including an element 
at every step. The element is permanently added on the base of a function evaluating 
its relevance for the partial solution under construction. 

Assuming a solution is composed of elements e € E that can be added to a 
partial solution s, the greedy algorithm decides which element to add by computing 
an incremental cost function c(e, s). Algorithm 4.2 provides the framework of a 
greedy constructive method. 


Algorithm 4.2: Framework of a greedy constructive method. Strictly 
speaking, this is not an algorithm since different implementation options are 
possible, according to the definition of the set E of the elements constituting 
the solutions and the incremental cost function 


Input: A trivial partial solution s (generally 2); set E of elements constituting a solution; 
incremental cost function c(s, e) 
Result: Complete solution s 


1i R—E // Elements that can be added to s 
2 while R Z 2 do 

3 Ve € R, compute c(s, e) 

4 Choose e' optimizing c(s, e^) 

5 s= sue // Include e' in the partial solution s 
6 Remove from R the elements that cannot be added any more to s 


Algorithms with significantly different performances can be obtained according 
to the definition of E and c(s, e). Considering the example of the Steiner tree, one 
could consider E as the set of edges of the problem and the incremental cost function 
as the weight of each edge. In this case, a partial solution is a forest. 

Another modeling could consider E as the Steiner points. The incremental cost 
function would be to calculate a minimum spanning tree containing all terminal 
nodes plus e and those already introduced in s. 
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We now provide some examples of greedy heuristics that have been proposed for 
a few combinatorial optimization problems. 


4.3.1 Greedy Heuristics for the TSP 


Countless greedy constructive methods have been proposed for the TSP. Here is a 
choice illustrating the variety of definitions that can be made for the incremental 
cost function. 


4.3.1.1 Greedy on the Edges 


The most elementary way to design a greedy algorithm for the TSP is to consider the 
elements e to add to a partial solution s are the edges. The incremental cost function 
is merely the edge weight. Initially, we start from a partial solution s = @. The set 
R consists of the edges that can be added to the solution, without creating a vertex 
of degree > 2 or a cycle not including all the cities. Figure 4.2 illustrates how this 
heuristic works on a small instance. 


4.3.1.2 Nearest Neighbor 


One of the easiest greedy methods to program for the TSP is the nearest neighbor. 
The elements to insert are the cities rather than the edges. A partial solution s is, 
therefore, a path in which the cities are visited in the order of their insertion. The 
incremental cost is the weight of the edge that connects the next city. Figure 4.3 
illustrates the execution of this heuristic on the same instance as above. It is a 
coincidence to get a solution identical to the previous method. 
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Fig. 4.2. Steps of a greedy constructive method based on the edge weight for the TSP 
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Fig. 4.3 Running the nearest neighbor for a tiny TSP instance 
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The nearest neighbor greedy heuristic can be programmed very concisely, in 
O (n2), where n is the number of cities (see Code 4.3). 
Code 4.3 tsp nearest neighbor.py Nearest neighbor for the TSP. Note the similarities with the 
implementation of the Dijkstra algorithm given by Code 2.1 


1| #HHHHHHHH Nearest Neighbor greedy heuristic for the TSP 
2| def tsp nearest neighbor (d, # Distance matrix 
tour): # List of cities to order 


5 n = len(tour) 

6 length = 0 # Tour length 
for i in range(1, n): # Cities from tour[0] to tour[i-1] are fixed 

8 nearest - i 4 Next nearest city to insert 

9 cost ins = d[tour[i-1]] [tour[il] # City insertion cost 

10 for j in range(i«1, n): 4 Find next city to insert 

11 if d[tour[i-1]] [tour[j]] « cost ins: 

12 cost ins = d[tour[i-1]] [tour[jl] 

13 nearest - j 

14 length += cost ins 

15 tour[i], tour [nearest] = tour [nearest], tour[i] # Definitive insertion 


17 length += d[tour[n - 1]] [tour[0]] # Come back to start 


19 return tour, length 


4.3.1.3 Largest Regret 


A defect of the nearest neighbor is to temporarily forget a few cities, which subse- 
quently causes significant detours. This is exemplified in Fig. 4.3. To try to prevent 
this kind of situation, we can evaluate the increased cost for not visiting city e just 
after the last city i of the partial path s. In any case, the city e must appear in the final 
tour. This will cost at least min; ker dje + dex. Conversely, if e is visited just after i, 
the cost is at least min;er die + der. The largest regret greedy constructive method 
chooses the city e maximizing c(s, e) = minj ker (dje + dek) — minrerR (die + der). 


4.3.1.4 Cheapest Insertion 


The cheapest insertion heuristic involves inserting a city in a partial tour. The set E 
consists of cities, and the trivial initial tour is a cycle on both cities which are the 
nearest. The incremental cost c(s, e) of a city is the minimum detour that must be 
consented to insert the city e in the partial tour s between two successive cities of s. 
Figure 4.4 illustrates this greedy method. 
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Fig. 4.4 Running the cheapest insertion for a tiny TSP instance 
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Fig. 4.5 Running the farthest insertion for a tiny TSP instance 


4.3.1.5 Farthest Insertion 


The farthest insertion heuristic is similar to the previous one, but it selects the city 
whose insertion causes the most significant detour. However, each city is inserted 
at the best possible place in the tour. Figure 4.5 illustrates this greedy method. It 
seems counter-intuitive to choose the most problematic city at each step. However, 
this type of construction reveals less myopic and frequently produces better final 
solutions than the previous heuristics. 

Here, we have provided only a limited range of greedy constructive methods that 
have been proposed for the TSP. The quality of the solutions they produce varies. It 
is usually not challenging to find problem instances for which a greedy heuristic is 
misguided and makes choices increasingly bad. On points uniformly distributed on 
the Euclidean plane, they typically provide solutions a few tens of percent above the 
optimum. 


4.3.2 Greedy Heuristic for Graph Coloring 


After reviewing several methods for the TSP, it is necessary to present a not too 
naive example for another problem. 

A relatively elaborate greedy method for coloring the vertices of a graph tries 
to determine the node for which assigning a color may be the most problematic. 
The DSatur [1] method assumes it corresponds to the node with already colored 
neighbors using the broadest color palette. For this purpose, the saturation degree of 
a vertex v is defined, noted DS (v), corresponding to the number of different colors 
used by the vertices adjacent to v. At equal degree of saturation—particularly at 
the start, when no vertex is colored—the node with the highest degree is selected. 
At equivalent degree and saturation degree, the nodes are arbitrarily selected. 
Algorithm 4.3 formalizes this greedy method. 
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Algorithm 4.3: DSatur algorithm for graph coloring. The greedy criterion 
used by this algorithm is the saturation degree of the vertices, corresponding 
to the number of different colors used by adjacent nodes 


Input: Undirected graph G = (V, E); 
Result: Vertex coloring 
Color with 1 the vertex v with the highest degree 
R<V\v 
colors — 1 
while R Z 2 do 
Vv € R, compute DS(v) 
Choose v’ maximizing DS(v^), with the highest possible degree 
Find the smallest k (1 < k < colors + 1) such that color k is feasible for v’ 
Assign color k to v' 
if k > colors then 
| colors =k 


R—R\vV 


ewuan AUNE 


- 
e 
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4.4 Improvement of Greedy Procedures 


The chief drawback of a greedy construction is that it never changes a choice 
performed in a myopic way. Conversely, the shortcoming of a complete enumerative 
method is the exponential growth of the computational effort with the problem 
size. To limit this growth, it is therefore necessary to limit the branching. This is 
typically achieved on the basis of greedy criteria. This section reviews two partial 
enumeration techniques that have been proposed to improve a greedy algorithm. 

First, the beam search was proposed within the framework of an application 
in speech recognition [5]. Second is the more recent pilot method, proposed by 
Duin and Vof [3]. It was presented as a new metaheuristic. Other frames have been 
derived from it [4]. 


4.4.1 Beam Search 


Beam search is a partial breadth-first search. Instead of keeping all the branches, at 
most, p are kept at each level, on the basis of the incremental cost function c(s, e). 
Arriving at level k, the partial solution at the first level is completed with the element 
e Which leads to the best solution at the last enumerated level. Figure 4.6 illustrates 
the principle of a beam search. 
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Partial initial solution 


Element added to the initial solution 
p best candidates at level 1 


p best candidates at level 2 


Best candidate at level k E 


Fig. 4.6 Beam search with p = 3 and k = 3. Before definitively choosing the element to insert in 
the partial solution, a breadth-first search is carried out up to a depth of k, only retaining the p best 
candidates at each depth 


A beam search variant proceeds by making a complete enumeration up to a level 
containing more than p nodes. The p best of them are retained to generate the 
candidates for the next level. 


4.4.20 Pilot Method 


The framework of the pilot method requires a so-called pilot heuristic to fully 
complete a partial solution. This pilot heuristic can be a simple greedy method, 
for example, the nearest neighbor heuristic for the TSP, but it can equally be a much 
more sophisticated method, such as one of those presented in the following chapters. 

The pilot method enumerates all the partial solutions that can be obtained by 
including an element to the starting solution. The pilot heuristic is then applied to 
all these partial solutions to end up with as many complete solutions. The partial 
solution at the origin of the best complete solution is used as the new starting 
solution, until there is nothing more to add. Figure 4.7 illustrates two steps of the 
method. 

Algorithm 4.4 specifies how the pilot metaheuristic works. In this framework, 
the ultimate "partial" solutions represent a feasible complete solution which is not 
necessarily the solution returned by the algorithm. Indeed, the pilot heuristic can 
generate a complete solution that does not necessarily include the elements of the 
initial partial solution, especially if it includes an improvement technique more 
elaborated than a simple greedy constructive method. 
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Initial partial solution 


ent added to the initial solution 


Next partial solution 


Pilot heuristic completion 


Best complete solution 


Fig. 4.7 Pilot method. An element is included in the partial solution; then a pilot constructive 
heuristic is applied to fully complete it. The process is repeated with another element added to the 
partial solution. The element finally inserted is the one that led to the best complete solution 


Algorithm 4.4: Frame of a pilot method 


Input: s, trivial partial solution; set E of elements constituting a solution; pilot heuristic 
h(s-) for completing a partial solution se; fitness function f(s) 
Result: Complete solution s* 


1R—E // Elements that can be added to s 
2 while R Z 2 do 
3 Vi co 
4 forall e € R do 
5 Complete sp with e to get se 
6 Apply h(s,) to get a complete solution s 
7 if f(s) < v then 
s ve f(s) 
9 Sp — Se 
10 if s is better than s* then Store the improved solution 
u | seas 
12 Sp — Sp // Add an element to the partial solution s; 
13 Remove from R the elements that cannot properly be added to sp 


Code 4.4 provides an implementation of the pilot method for the TSP. The pilot 
heuristic is the nearest neighbor. 
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Code 4.4 tsp pilot nearest neighbor.py Implementation of a pilot method with the nearest 
neighbor (Code 4.3) as pilot heuristic 


from tsp utilities import tsp length 4 Listing 12.2 
from tsp nearest neighbor import +* # Listing 4.3 


HHHHHHHHH Constructive algorithm with Nearest Neighbor as Pilot method 
# Number of cities 


d): # Distance matrix 
tour = [i for i in range(n)] # All cities must be in tour 


for q in range(n - 1): # Cities up to q at their final position 
length_r = tsp_length(d, tour) 
to_insert = q 
for r in range (q;. n): # Choose next city to insert at position q 
sol = [tour[i] for i in range(n)] # Copy of tour in sol 
8Sol[ql; sor[r] = Balil, sáta # Tentative city at postion q 
sol[q:n], _ = tsp nearest neighbor (d, sol [q:n]) 
tentative length - tsp length(d, sol) 
if length r » tentative length: 
length r - tentative length 
to ingert = £ 


# Put definitively to_insert at position q 
tour [q], tour[to insert] = tour [to insert], tour [q] 


return tour, tsp_length(d, tour) 


Problems 


4.1 Random Permutation 
Write a procedure to generate a random permutation of n elements contained in an 


array p. It is desired a probability of 1/n to find any element in any position in p. 
Describe the inadequacy of Algorithms 4.5 and 4.6. 


Algorithm 4.5: Bad algorithm to generate a random permutation of n 
elements 

Input: A set of n elements e;,...,e; 

Result: A permutation p of the elements 


1i—0 // Number of element already chosen 
2 while i Z n do 

3 Draw a random number u uniformly between | and n 

4 if e, is not already chosen then 

5 


ic itl 
Pi — eu 


a 
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Algorithm 4.6: Another bad algorithm to generate a random permutation of 
n elements 


Input: A set of n elements e;,...,e; 
Result: A permutation p of the elements 
1i—0 // Number of element already chosen 
2 while i Z n do 
3 Draw a random number u uniformly between | and n 
4 i—i-cl 
5 if e, is already chosen then 
6 Find the next u’ such that e, is not chosen 
7 Di — Cw 
8 else 
9 | Pi — Cu 


4.2 Greedy Algorithms for the Knapsack 
Propose three different greedy algorithms for the knapsack problem. 


4.3 Greedy Algorithm for the TSP on the Delaunay 

We want to build the tour of a traveling salesman (on the Euclidean plane) using 
only edges belonging to the Delaunay triangulation. Is this always possible? If this 
is not the case, provide a counter-example; otherwise, propose a greedy method and 
analyze its complexity. 


4.4 TSP with Edge Subset 

To speed up a greedy method for the TSP, only the 40 shortest edges adjacent to 
each vertex are considered. Is this likely to reduce the algorithmic complexity of the 
method? Can this cause some issues? 


4.5 Constructive Method Complexity 

What is the complexity of the nearest neighbor heuristic for TSP? Same question 
if we use this heuristic in a beam search by retaining p nodes at each depth and 
that we go to k levels down. Similar question for the pilot method where we equally 
employ the nearest neighbor as the pilot heuristic. 


4.6 Beam Search and Pilot Method Applications 

We consider a TSP instance on five cities. Table 4.1 gives its distance matrix. 
Apply a beam search to this instance, starting from the city 1. At each level, only 

p = 2 nodes are retained, and the tree is developed up to k = 3 levels down. 
Apply a pilot method to this instance, considering the nearest neighbor as the 

pilot heuristic. 
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Table 4.1 Distance matrix 
for Problem 4.6 


JAMANE 


4.7 Greedy Algorithm Implementation for Scheduling 
Propose two greedy heuristics for the permutation flowshop problem. Compare their 
quality on problem instances from the literature. 


4.8 Greedy Methods for the VRP 
Propose two greedy heuristic methods for the vehicle routing problem. 
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Chapter 5 A) 
Local Search Geek for 


Improvement techniques are a cornerstone in the design of heuristics. As will be 
seen later, most metaheuristics incorporate a local search. 

By examining the solutions produced by greedy constructive heuristics like those 
presented in the previous chapter, we immediately notice they are not optimal. For 
example, the solution of the Euclidean traveling salesman problem obtained by 
the nearest neighbor heuristic, shown in Fig. 4.3, has intersecting edges, which is 
obviously suboptimal. Indeed, it is possible to replace the two intersecting edges by 
two others whose sum of the lengths is lower while preserving a tour. Replacing two 
edges by two others that are shorter can then be repeated until a solution cannot be 
improved by the same process as shown in Fig. 5.1. 


5.1 Local Search Framework 


The general idea is therefore to start from a solution obtained using a constructive 
method and improve it locally. The process is repeated until no further improvement 
is achieved. This frame is well known in continuous optimization, to seek an 
optimum of a differentiable function with gradient methods. The gradient concept 
for finding an improving direction does not exist in discrete optimization; it is 
replaced by the definition of “minor” changes of the solution called moves or the 
concept of neighborhood. 

To be able to apply Algorithm 5.1, it is necessary to define how to obtain the 
neighbor solutions. Formally, a neighborhood set N(s) C S must be defined for any 
solution s € S. Therefore, the search for a modification of s implies enumerating 
the solutions of N (s) to extract one of them, s’, which is better than s. 

A convenient way to define a neighborhood N (5) is to specify the modifications, 
commonly called the moves, that can be applied to the solution s. In the example of 
Fig. 5.1 for the TSP, a move m can be specified by a pair of cities [i, j]. It consists 
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Fig. 5.1 Successive improvements of a TSP solution with the 2-opt local search. Two edges (in 
color) are replaced by two others (dotted), and the sum of their lengths is lower 


Algorithm 5.1: General framework of a local improvement method. It is 
assumed to have a solution to the problem as well as a method able to 
generate, from any solution, a number of other ones 
Input: Solution s, method modifying a solution 
Result: Improved solution s 
1 repeat 


2 if there is a modification of s into s' improving s then 
/ 
| ses 


4 until no improvement of s is found 


D 


in replacing the edges [i, s;] and [j, sj] by the edges [i, j] and [s;, sj], where s; and 
5j are, respectively, the cities that follow i and in the solution s. 

This neighborhood of replacing two edges by two others is known in the literature 
as 2-exchange or 2-opt [2]. The set M (s) of 2-opt moves that can be applied to the 
solution s can be formally defined by M (s) = ([i, j]. i, j es i Æ j, j Æ si, i F 
sj}. Applying a move m € M (s) to the solution s is sometimes noted s © m. 

The definition of the neighborhood can be obtained with the definition of the set 
of moves: N(s) = (s'|s! = s pm, m € M(s)}. The size of the 2-opt neighborhood 
is |N(s)| = @(|s|?). The application of an improvement method to the TSP 
can therefore be reasonably achieved by enumerating the neighbor solutions. This 
enumeration can be done according to two policies, either the first improvement or 
the best improvement. 


5.1.1 First Improvement Heuristic 


With this policy, the current solution is immediately changed as soon as an improv- 
ing move is identified. The neighborhood is therefore not thoroughly examined at 
each iteration. This policy is therefore aggressive. It allows a solution to be improved 
quickly. It generally leads to a greater number of changes to the initial solution than 
the best improvement policy. The framework of the first improvement method is 
provided by Algorithm 5.2. 
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Algorithm 5.2: Framework of the first improvement heuristic 


Input: Solution s, neighborhood specification N(-), fitness function f(-) to minimize. 
Result: Improved solution s 
1 forall s’ € N(s) do 
2 | if f(s’) < f(s) then Move to s’, break the loop and initiate the next one 
/ 


3 S ——8 


5.1.2 Best Improvement Heuristic 


With the best improvement policy, the neighborhood is thoroughly examined at each 
iteration. The best neighbor solution identified is the current one for the subsequent 
iteration. Algorithm 5.3 formalizes this policy. 


Algorithm 5.3: Framework of the best improvement method 


Input: Solution s, neighborhood specification N(-), fitness function f(-) to minimize. 
Result: Improved solution s 

1 repeat 

2 end — true 

3 best neighbor value +— eo 

4 forall s’ c N(s) do 

5 if f(s’) « best neighbor value then A better neighbor is found 
6 best neighbor. value — f (s") 

7 | best neighbor — s 


if best neighbor. value < f(s) then Move to the improved solution 
9 s — best neighbor 
10 end — false 


u until end 


It performs more work between each change to the solution. The improvements 
are therefore larger and fewer in number. This policy is less frequently used in 
metaheuristics. We will see later that taboo search is based on this framework. 
Indeed, this technique tries to learn how to modify a solution smartly. It is therefore 
appropriate to examine the neighborhood thoroughly rather than rushing to the first 
small improvement encountered. 

However, there are problems where there is an interest in exploiting this policy. 
For the quadratic assignment problem, it can be shown that evaluating a neighbor 
solution takes a time proportional to O (n), whereas it is possible to evaluate the 
set of O (n?) neighbor solutions in O (n?). With the same computational effort, it is 
therefore possible to evaluate more solutions with the best improvement policy. 
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Sometimes, the set N (s) is so large that its enumeration is done either implicitly 
to extract the best neighbor solution or heuristically; we will come back to this in 
particular in Chap. 6 with the large neighborhood search technique Sect. 6.4.1. 

Code 5.1 implements the best improvement framework for the TSP. The algo- 
rithm seeks the best replacement of two edges by two others. 


Code 5.1 tsp 2opt best.py Implementation of a best improvement method for the TSP with 2- 
opt neighborhood 


1| HHHHHHHHHE Local search with 2-opt neighborhood and best improvement policy 
2| def tsp 2opt best(d, # Distance matrix 


3 tour, # Solution 
4 length) : # Solution length 
5 n = len(tour) 
6 best_delta = -1 

while best_delta < 0: 
8 best delta = float ("inf") 


9 best i = þest j s -1 # Best move to perform 
10 for i in range(n - 2): 

11 TEA 

12 while j « n and (i. 0 or j « m=: 1): 

delta = \ 

14 d[tour[i]] [tour[j]] + d[tour[i«1]] [tour[(j+1)%n]] \ 

15 - d[tour[i]] [tour[i+1]] - d[tour[j]] [tour[ (j+1) sn] ] 

16 if delta « best delta: 

best delta - delta 


18 best i, best j - i, j 

19 70S # Next neighbor 

0 

21 it best delta « 0: # Perform best move if it improves best solution 

22 length += best delta 4 Update solution cost 
i; J = best i+1, best j # Reverse path from best_i+1 to best_j 

24 while i « j: 

25 tour[i]l, tour[jl = tour[jl; tour [i] 


6 dy ges d yd 


28 return tour, length 


5.1.3 Local Optima 


By applying Algorithm 5.1, whether one of its variants (5.2 or 5.3), a locally optimal 
solution is obtained with respect to the neighborhood used. Indeed, there is no 
guarantee that the returned solution is the best for the particular instance. Globally 
optimal solutions are therefore opposed to those which are only locally optimal. 
It should be emphasized that a local optimum relative to one neighborhood is not 
necessarily a local optimum for another neighborhood (see Problem 5.1). A plateau 
is a set of neighboring solutions all having the same value. Figure 5.2 illustrates the 
notion of local optimum, plateau, and global optimum. 

Objective functions such as min(max(...)) generate many plateaus with many 
solutions. Their optimization with a local search is therefore difficult. Indeed, all 
solutions in a plateau are local optima. 
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Fig. 5.2 Local minima, global minimum, and plateau of a function of a discrete variable x relative 
to a neighborhood consisting in changing x by one unit 


However, for some problems, that frame provides globally optimal solutions. 
These include, in particular, the shortest path problem with the Bellman-Ford 
Algorithm 2.4 and linear programming with the simplex algorithm. 

Since the set of solutions to a combinatorial problem is finite and Algorithms 5.2 
and 5.3 only modify the solution if it is strictly improved, we deduce these 
algorithms end after a finite time. By cons, their calculation time is not necessarily 
polynomial, even if the size of the neighborhood is. In practice, as with the simplex 
algorithm, we do not observe such a degeneration. 

Figure 5.3 shows the evolution of the average computing time for two methods 
applied to the TSP (the best improvement and first improvement policies) as a 
function of the number of cities. The starting solution is randomly generated, and 
the instances are selected from the 2D Euclidean ones of the TSPLIB [11]. 


5.1.3.1 TSP 3-Opt 


The beginning of this chapter presents a local search for the TSP based on replacing 
two edges by two others. It is naturally possible to define other neighborhoods, for 
instance, replacing three edges (or arcs in the case of a non-symmetrical problem) 
by three others. Figure 5.4 shows this type of move, called 3-opt. 
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Fig. 5.3 Evolution of the computational time as a function of the number of TSP cities for 


two variants of improvement methods. Both scales are logarithmic. Hence, computational times 
approximately aligned on a straight line indicates a polynomial dependence 


Fig. 5.4 3-opt move. Three arcs are replaced by three others. They connect three sub-paths 
traversed in the same direction, before and after modification. Another way to apprehend the 3- 
opt neighborhood is the displacement of a sub-path elsewhere in the tour 
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Code 5.2 tsp 3opt first.py Implementation of an improvement method based on 3-opt neighbor- 
hood. In practice, its complexity in 2 (N?) is excessive for tackling instances with several hundred 
cities 


HHHHHHHHH Local search with 3-opt neighborhood and first improvement policy 


def tsp 3opt first(d, # Distance matrix 
succ, # Solution 


length): # Solution length 


last i, last j; last k = 0, succ[0], succ[suce[0]] 
i, jy; k= last i, last. ;j, last k 
while True: 
delta = d[i] [succ[j]] + d[jl[succ[k]] + d[k] [succ[i]] \ 
-d[i] [succ[i]] - d[jl[succ[j]] - d[k] [succ[k]] # Move cost 
if delta < 0: # is there an improvement? 
length += delta # Update solution cost 
succ[i], succ[jl, succ[k] = succ[j], succ[k], succ[i]# Perform move 
Jj; kK =e ky. 1) # Replace j between i and k 


last-i; last j; tst k= 13; jy K 
k = succ[k] 4 Next 
if k == i: # k at its last value, next 
j suec] Kk succ [j] 
i: 4 j at its last value, next i 
sucs gucce[il; k --succLjl 
last i and j -- last j and k -- last k: 


return succ, length 


An attractive property of this neighborhood is not to change the path direction 
between the three nodes whose successors are modified. In the case of symmetrical 
problems, there are several ways to reconnect the three sub-paths (see Problem 5.4). 

Representing a solution by means of a permutation s whose element s; indicates 
the city visited just after the city i, a 3-opt move can be implemented in constant 
time. Checking that a solution is 3-optimal can be done in O(n*). Hence, without 
using neighborhood reduction techniques, it can only manage relatively small 
instances. Code 5.2 implements a local search for the TSP with 3-opt moves. It 
applies the first improvement policy. 


5.1.3.2. TSP Or-Opt 


Another type of neighborhood proposed by Or [8] is to modify the visiting order of 
a few consecutive cities. The idea is to examine whether it is pertinent to place three 
successive cities somewhere else in the current tour. 

The originality of the method proposed by Or is exploiting several neighbor- 
hoods. Once it is no longer possible to improve the solution by changing the 
visiting order of three cities, we try to change only two. As soon as changing two 
cities improves the solution, we return to the first neighborhood. When the solution 
is locally optimal with respect to these two neighborhoods, we try changing the 
position of a unique city. Figure 5.5 illustrates a possible Or-opt move. 
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Fig. 5.5 Or-opt move, where three successive vertices are moved within the tour. Possibly, it is 
worthier to modify the sub-path direction 


This neighborhood is distinct from a limitation of the 3-opt neighborhood in 
which a sub-road would be limited to 3, 2, or 1 city. Indeed, Or neighborhood tries 
to reverse the sub-path direction. Testing if a tour is Or-optimal takes © (1?) time. 


5.1.3.3 Data Structure for TSP 2-Opt 


The 2-opt neighborhood reverses the direction of a sub-path. Visually, this seems 
innocuous for a symmetrical instance. But, it is not so for the computer representa- 
tion of a solution. A data structure, inspired by the work of [9], enables performing 
a 2-opt move in constant time. For each city, an array stores both adjacent cities. 
The array components at indices 2i and 2i + 1 provide both adjacent cities of the 
city i. 

An array t with 2n indices represents a solution. Initially, t2;/2 provides the 
number of the city succeeding i and (t2;+1 — 1)/2 the number of the city preceding i. 
A 2-opt move consists in modifying four values of the array t. This can be realized 
in constant time. Figure 5.6 illustrates the operating principle of this data structure. 

Code 5.3 initializes an array t implementing this data structure from a given tour. 
The last is provided by the list of cities successively visited (and not the successor 
of each city). 

Code 5.4 implements a first improvement heuristic based on this principle. 
It uses the shift operator i >> 1 to quickly evaluate the expression i/2. The 
last is the number of the ith city. The "exclusive or" operator i ^ 1 allows 
quickly calculating the expression ic1-2* (i$2) providing the index to access 
the adjacent city. 
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Fig. 5.6 Data structure for performing 2-opt moves in constant time. The 2i and 2i + 1 entries 
of an array allow identifying both cities adjacent to the city i. Starting from the index 0, we can 
reconstitute the tour by following the adjacent city. Starting from the index 1, we can reconstitute 
the tour in the other direction. Performing a 2-opt move consists in altering four entries in the array 


Code 5.3 build 2opt data structure.py Implementation and initialization of the data structure 
presented in Fig. 5.6. The tour is provided by the sequence of the cities to visit. The data structure 
allows performing a 2-opt move in a constant time 


1| 4HHHHHHHHE Data structure building for performing 2-opt move in constant time 
2| def build 20opt data structure (tour): # Order of visit of the cities 
len(tour) 

4 t = [-1] * 2*« n 
5 for i in range(n - 1): i Forward tour 
6 t[2 x tour[i]l 2 tour[i + 1] 

Tis a tourin = XI] 2 tour [0] 
8 for i in range(1, n): # Backward tour 
9 tiz æ tour tr]: + 4] 2 x tour ic = T] + T 
10 tIS + ‘tourfo] + 1] tour[n - 1] +1 
M return t 
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Code 5.4 tsp 2opt first.py Implementation of a first improvement heuristic for the TSP based 
on 2-opt neighborhood. This implementation exploits the data structure shown in Fig.5.6 for 
performing the moves in constant time 


from build 2opt data structure import build 2opt data structure # Listing 5.3 
from tsp utilities import tsp 2opt data structure to tour # Listing 12.2 


4| SHHHHHHHHE Local search with 2-opt neighborhood and first improvement policy 

s| def tsp 2opt first(d, # Distance matrix 
tour, # Solution 
length): # Solution length 


= len(tour) 
= build_2opt_data_structure(tour) 
= last i= 0 # i = starting city || last_i = i - a complete tour 
while t[t[i]] >> 1 != last_i: # Index i has made 1 turn without impovement 
J = titnll 
white j- s> 1 "2 Tast i and: (t[j]s51 (= test i or 43521 !- last a): 


delta = d[i>>1] [j>>1] + d[t[i]>>1] [t[j]>>1] \ 
- d[i>>1] [t[i]>>1] - d[j>>1] [t[j]>>1 
if delta < 0: # An improving move is found 
next 3; next = Clil; EN # Perform move 
EE Cll) Se ee, adi 


x 


t[next i ^ 1], t[next j 1] = next j, next i 


length += delta 4 Update solution cost 
T5 l 4 Solution improved: i must make another turn 


4 Next j 
# Next i 


return tsp 20pt data structure to tour(t), length 


5.1.4 Neighborhood Properties 


A neighborhood connects various solutions of the problem. Thus, a graph can 
represent it. The vertices are the solutions. An edge connects two neighbor solutions. 
The edges can be directed if the moves are not immediately reversible. An efficient 
neighborhood, or its representative graph, should have certain properties. 


5.1.4.1 Connectivity 


The connectivity property of a neighborhood stipulates that any feasible solution 
can reach at least one globally optimal solution. In other words, there must be a path 
from any vertex to one representing an optimal solution. This path is not necessarily 
monotonously improving. 
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5.1.4.2 Low Diameter 


A neighborhood is expected to allow discovering an optimal solution in a few steps. 
By definition, the diameter of a graph is the maximum length of a shortest path 
connecting two vertices. 


5.1.4.5 Low Ruggedness 


A neighborhood should have as few local optima as possible and a strong correlation 
between the values of neighbor solutions. The ideal would be to have only one, 
which would then be the global optimum, achievable every time starting from 
any solution. This property is certainly not satisfied for intractable problems and 
polynomial neighborhoods. However, finding an adequate neighborhood for the 
problem under consideration is essential for the success of an improvement method. 

For problems like the TSP, many neighborhoods have been devised, some being 
remarkably effective for obtaining excellent solutions. This can most likely be 
explained by the visual side of the problem, which considerably supports us in 
deciding what modifications to make to a solution to improve it. For other problems, 
it is challenging to imagine neighborhoods, and these sometimes lead to “egg 
carton"-type landscapes, very poorly suited to optimization. 

One possibility for smoothing a neighborhood is to modify the fitness function. 
The flying elephant method of [13] exploits this trick for the optimization of 
continuous functions. The |x| terms in the objective function are replaced by 
4 x? -- 1? and terms of the type max(0, x) by (x + Vx? + :2)/2. When the 
parameter t tends toward 0, the modified fitness function gets closer to the original 
objective function. Flying elephants is a metaphor that makes the analogy with a 
large round object dropped on a rugged field. Such an object will roll further to a 
lower altitude than a small one that will stop at the first small basin. 


5.1.4.4 Small Size 


Since improvement methods are based on the systematic evaluation of neighbor 
solutions, the neighborhood size should not be excessively large. For instance, the 
2-opt neighborhood for the TSP is in O (n?). This allows addressing problems with 
several thousand cities. This is not possible with the 3-opt neighborhood, in O (n°). 


5.1.4.5 Fast Evaluation 


Finally, the algorithmic complexity of evaluating the quality of the neighbor 
solutions should be as low as possible. For an asymmetric traveling salesman 
problem, the advantage of the small 2-opt neighborhood size is negated by the fact 
that a constant time cost evaluation is no longer possible. In this case, evaluating 
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Fig. 5.7 2-opt move where two edges are replaced by two other edges. This type of move is only 
suitable for symmetrical problems because a part of the tour is traversed in the opposite direction, 
which can significantly alter its cost if the problem is not symmetrical 


the larger 3-opt neighborhood could be faster. As shown in Fig.5.7, part of the 
tour is reversed with the 2-opt neighborhood. Thus, the ruggedness of the 2-opt 
neighborhood is also higher than that of the 3-opt for highly asymmetric problems. 


5.2 Neighborhood Limitation 


Typically, the size of a neighborhood grows quadratically or cubically with the 
size of the problem instance. As local searches require repeatedly evaluating the 
entire neighborhood, the computations are prohibitive as the instance size increases. 
Various techniques have been proposed to limit the computational growth. 


5.2.1 Candidate List 


A first idea is to make the hypothesis that a favorable move for a solution will remain 
good for similar solutions. A general method for limiting the computational effort 
is first to evaluate all moves applicable to a given solution. A selection of the best 
ones is stored in a candidate list. Only the moves contained in the list are evaluated 
for some iterations. Periodically, the whole neighborhood must be evaluated. Indeed, 
the solution is likely to have been quite modified since the candidate list elaboration. 
Moves that were unfavorable can thus become interesting and vice versa. 
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5.2.1.1 Candidate List for the Euclidean TSP 


In the case of the TSP, the move evaluation is independent of the solution. A 
candidate list of moves does not need to be periodically reconstructed. But, it implies 
developing a mechanism to detect whether a given move is valid. For instance, a 
move can create two or more sub-cycles. If the TSP cities are on the Euclidean 
plane, building a Delaunay triangulation requires a work in O(n logn). 

The candidate moves only consider the edges present in the triangulation. It can 
be proved that a Delaunay triangulation has O (n) edges and an average vertex 
degree not exceeding six. Hence, the size of this limited neighborhood is in © (n). 
Empirical observation reveals the edges of an optimal tour are almost all part of the 
Delaunay triangulation. This is illustrated in Fig. 5.8. 

Unluckily, this technique solely applies to Euclidean problems. Indeed, the 
construction of a Delaunay triangulation relies on geometric properties. A general 
neighborhood limitation technique uses a form of learning with solutions (see 
Chap. 10) or vocabulary building (see Sect. 8.2). 

The idea behind this technique is to generate a number of solutions and limit the 
neighborhood to moves comprising only elements making part of these solutions. 
They do not need to be of exceptional quality. But, they must have diverse structures 
and have similar portions with good solutions to the problem. Also, obtaining them 
should not need excessive computational effort. Chapter 6 shows how to proceed 
with large instances. 
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Fig. 5.8 Optimal tour of the TSP instance tsp225 on which is superimposed the Delaunay 
triangulation. Here, all the edges of the optimal tour are part of the triangulation 
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5.2.1.2 TSP Neighborhood Limitation with 1-Trees 


Another general neighborhood limitation technique for the TSP, proposed by 
Helsgaun [5], uses the concept of 1-tree (see Sect. 3.1.1.2 presenting a Lagrangian 
relaxation for the TSP). Shortly, the idea is to compute the extra cost that would 
result if the edge (i, j) is part of the 1-tree. The neighborhood reduction proceeds 
by keeping a few edges—typically 5—adjacent to each vertex with the lowest extra 
costs. A beneficial collateral effect of this edge length modification technique is to 
lower the ruggedness of the neighborhood. 

Its algorithmic complexity depends on the construction of a minimum spanning 
tree. As seen in Sect. 2.1, this complexity depends on the number of edges. For 
large instances, it is necessary to start by reducing their number, for example, with 
the merging tour technique presented in the previous section. Both neighborhood 
reduction techniques have similarities with granular search. 


5.2.2 Granular Search 


Granular search is to a priori eliminate solutions with certain characteristics. 
Illustrating this on the vehicle routing problem, one can assume that good solutions 
will not include a path directly connecting distant customers. For this problem, 
[12] proposed to ignore the solutions which involve trips between two clients 
whose length is greater than a value. The last is set to 6 times the average trip 
length of a solution obtained using a fast constructive heuristic, where f is a 
parameter generally slightly greater than 1. This parameter is called the local search 
granularity. However, the paths between the depot and the customers should remain, 
whatever their length is. 

A similar technique has been used extensively for the TSP. Instead of considering 
a complete graph where each city is connected to all others, it is only connected to 
its p closest neighbors, with p limited to a few dozen. Thus, the size of a 2-opt 
neighborhood is n - p? instead of n?. The quality loss of the solutions obtained with 
such a neighborhood reduction is often negligible. 

By cons, implementation of this idea is not trivial. First, the reduced graph 
where each node is connected to its p nearest neighbors may be not connected. 
It is therefore necessary to add longer edges so that the graph contains at least one 
cycle passing through all nodes. Second, the local search implementation is more 
complex. 

For instance, the data structure shown in Sect. 5.1.3.3 for the 2-opt neighborhood 
cannot be used directly. Indeed, it is fast to determine the city s; succeeding to 
city i and a city j close to i (there are only p candidates). But it is not possible 
to immediately identify the city s; succeeding to j by following the tour in the 
direction i — 5;. 

In the same way, for the 3-opt neighborhood, we can quickly detect three cities 
i, j, and k that are close and can be candidates for a 3-opt move. But we cannot 


5.3 Neighborhood Extension 117 


determine in a constant time if, starting from i, the tour visits first the city j before 
the city k. Indeed, if the 3-opt move (i, j, k) is feasible for a solution, the move 
(i, k, j) is not: it creates three sub-tours. 


5.3 Neighborhood Extension 


There are problems for which it is challenging to imagine reasonable neighborhoods 
which substantially modify a solution. Hence, the following problem arises: how to 
implement substantial changes to a solution on the basis of a simple and limited 
neighborhood. 

Let us remark there is no contradiction in both limiting the size of a simple 
neighborhood with the techniques described above (to eliminate the moves that 
will never lead to good solutions) and extending this limited neighborhood with 
the techniques presented in this section. 


5.3.1 Filter and Fan 


To construct a neighborhood Ne extended from the definition of a small set M 
of moves applicable to a solution, we can consider k successive modifications 
Ne(s) = {s's — smi Q---Qmy,mi,...my € M(s)). The size of Ne increases 
exponentially with k. To avoid such a growth, we can use the beam search strategy 
presented in Sect. 4.4.1, but adapted to a local search rather than a constructive 
method. 

This technique, proposed by Glover [4], is called the filter and fan strategy. Each 
level only retains the p best neighbor solutions. Few of them may be worse than the 
starting solution. Then their neighbors are evaluated before repeating the process 
up to level k. Thus, at each step, up to k modifications are made according to the 
original neighborhood. It should be noted here that the best solution encountered 
when evaluating the extended neighborhood is not necessarily one of the ultimate 
level. This process is illustrated in Fig. 5.9. 

A variant of filter and fan search is not to retain a static number p of neighbor 
solutions at each level, but all the improving neighbor solutions. The choice of the 
ultimately retained neighbor solution at level 1 is not the one that leads to the best 
solution up to level k, but the one which most opens the fan, that is to say, the 
solution of level k — 1 which has the most improving solutions at level k. 
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Starting solution Moves applied to the starting solution 


p best neighbors at level 1 
p best neighbors at level 2 


Level k 


Best solution visited: Starting solution for the next step 


Fig. 5.9 Filter and fan with p = 3 and k = 3. Each level retains p neighbor solutions at most. 
The next current solution is the best of those listed 


5.3.2 Ejection Chain 


Another way to build a large neighborhood from basic modifications is to go through 
unfeasible solutions. The name of ejection chain has been proposed by Glover [3]. It 
combines and generalizes ideas from different sources. The most famous is certainly 
the Lin and Kernighan neighborhood for the TSP. 

A starting solution is transformed into an object called a reference structure. 
The last is not a proper solution, but it can easily be transformed either into other 
reference structures or into feasible solutions. The starting solution is disrupted by 
the ejection of one of its components to obtain a reference structure which can also 
be transformed by the ejection of another component. This chain of ejections ends 
either when a better solution than the starting one has been identified or when all 
the elements to eject have been tested. 

If an improving solution is discovered, the process is reiterated from it. Other- 
wise, the chain is initiated by trying to eject another item from the initial solution. 
The process stops when all possible chain initializations have been vainly tried. To 
prevent an endless process, it is forbidden either to add an item previously ejected 
to the reference structure or to propagate the chain by ejecting an element that was 
added to the reference structure. 


5.3.2.1 Lin-Kernighan Neighborhood 


One of the most effective neighborhoods for the TSP is due to Lin and Kernighan 
[7]. It is based on an ejection chain. The initial tour is transformed into a path by 
removing an edge [a, b]. This path is transformed into a reference structure. The 
last consists of a path linked to a cycle by including an edge [b, d]. The removal of 
the edge [c, d], which is part of the cycle of the reference structure, transforms the 
reference structure into another path. 
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Fig. 5.10 Operating principle of an ejection chain for the TSP. Eject an edge [a, b] to get a path. 
Insert an edge [b, d] to get a reference structure. It can be transformed either into another reference 
structure, by ejection of the edge [c, d] and addition of another edge, or into a tour by ejection of 
[c, d] and addition of the edge [a, c] 
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The last is either transformed into a tour by adding the edge [a, c] or in another 
reference structure by adding another edge incident to c. This node then plays the 
role of b of the previous step. Figure 5.10 illustrates the operating principle of this 
process. 

The Lin and Kernighan local search is presented in Algorithm 5.4. 

The ejection chain mechanism may seem artificial. However, it is possible 
to obtain relatively complex modifications and improve not awful solutions, as 
illustrated in Fig. 5.11. 

A basic Lin and Kernighan implementation is given in Code 12.3. Much more 
elaborated, highly efficient implementations are due to [1] and [6]. 
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Algorithm 5.4: Ejection chain (Lin and Kernighan) for the TSP 


Input: TSP Solution s 
Result: Improved solution s 


1 repeat 

2 Eject edge [a,b] to initiate the chain 

3 repeat 

4 Find the edge [b, d] to add, minimizing the reference structure weight, with [c,d] 

not has been removed in the current ejection chain 

5 if Edge |b, d] found and reference structure weight smaller than tour s then 
6 if Adding [a,c] and removing [c,d] improve the tour then 

7 | Success: Replace solution s by the new discovered tour 

8 else 

9 Add edge [b,d] 

10 Remove edge [c, d] 

11 b-c 
2 else 

13 Ejection failure: come back to s and try another ejection 
14 until s improved or no edge [b, d] exists 


15 until all edges of s have vainly initiated a chain 


EE 


Initial solution Path Structure 1 
Structure 2 Structure 3 Accepted solution 


Fig. 5.11 Application of an ejection chain on a small TSP. After four ejections, it is possible to 
improve the starting solution 
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5.4 Using Several Neighborhoods or Models 


A local optimum is relative to a given neighborhood structure. Hence, it is possible 
to use multiple neighborhoods simultaneously. For instance, a 2-optimal TSP tour 
is not necessarily 3-optimal. 

Once a 2-optimum solution has been found, it is potentially possible to improve 
it with a method using a 3-opt neighborhood (see, for instance, Fig. 9.4). Similarly, 
a 3-optimal solution is not necessarily 2-optimal. We can therefore repeat the 
improvement processes as long as the solution found is not a local optimum with 
respect to all the neighborhood structures considered. The reader can verify this fact 
by running Code 12.7. 

Finally, let us mention that one can switch from one modeling of the problem to 
another. To the extent that the neighborhood structure is not the same for the various 
modeling, it is equally possible to iterate improvement methods using different 
models. This technique may be inapplicable as is, since a feasible solution for 
one model can be unfeasible for another. In this case, repair methods should be 
provided when changing the modeling. This implies the process is no longer a strict 
improvement method. A corollary is that the search could enter an infinite cycle. 
Indeed, repairing a solution obtained with a first model and then improving it with 
a second model can cancel out the improvements obtained with the first model. 


5.5 Multi-Objective Local Search 


Various relatively simple local search techniques have been proposed for multi- 
objective optimization. We will examine two approaches not so difficult to imple- 
ment and producing good solutions. 


5.5.1  Scalarizing 


A technique mentioned in Sect. 3.2.1 for multi-objective optimization is scalarizing. 
It aggregates the objectives by associating a weight w; with the ith one. By 
providing a vector of weights and transmitting a scalar function to a local search, we 
can thus get an approximation of a supported solution. By varying the weight vector, 
we can discover more of these approximations. However, this technique produces at 
best one solution approximating the Pareto set for each local search run. 

Without needing much more programming efforts, it is possible to get a better 
approximation of the Pareto set by transmitting all the objectives to the local 
search. Hence, we can check each neighbor solution whether it improves the Pareto 
set approximation. An elementary implementation of this principle is presented 


122 5 Local Search 


Algorithm 5.5: Framework of an improvement method for multi-objective 
optimization. The user must provide a parameter Imax giving the number of 
scalarizations. Here, the weights are just randomly generated 


Input: Solution s, neighborhood N(-), objective functions f -) to minimize; parameter 
Imax 
Result: Approximation P of the Pareto set 
1P=s 
2 for Imax iterations do 


3 Randomly draw a weight vector Ww 

4 repeat 

5 end — true 

6 best neighbor.value — ee 

7 forall s’ € N(s) do 

8 if s’ is not dominated by solutions of P then 
9 Insert s’ in P and remove the solutions of P dominated by s’ 
10 if w- Ts ) € best neighbor value then 

u best neighbor value — W - T) 

12 best neighbor — s' 
13 if best neighbor value < W - f) then 

14 s — best. neighbor 

15 end — false 
16 until end 


in Algorithm 5.5. Figure 5.12 illustrates the behavior of Algorithm 5.5 for three 
different scalarizations. 


5.5.2 Pareto Local Search 


An alternative approach to scalarization is the Pareto local search [10]. The idea 
is to start with any solution to the problem. The last is the first estimate—of poor 
quality—of the Pareto set. While the estimated Pareto set is not stabilized, generate 
all the neighbor solutions of the estimated Pareto set and update it with them. 

Recursive procedures allow expressing the method very concisely, as shown by 
Algorithm 5.6. Code 5.5 implements a Pareto local search for the TSP. 

However, the method can be sped up by not starting with an arbitrary solution, 
but by calling it several times with good solutions obtained by scalarizing the 
objectives. Indeed, there are habitually very effective methods for solving mono- 
objective problems. This allows us to immediately get supported solutions. 
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Fig. 5.12 Trajectory and evolution of an iterated local search for the bi-objective TSP instance 
(EuclidAB100) using the scalarization technique. The improvement method is based on an ejection 
chain. The weights associated with the objectives are randomly drawn each time the search reaches 
a local optimum. The starting solution is obtained by a greedy constructive method working solely 
on the first objective 


For instance, polynomial algorithms exist for the linear assignment problem, 
while the multi-objective version is NP-hard. Starting Pareto local search with a 
reliable estimate of the Pareto set limits the number of updates, and the procedure 
stops faster. This also avoids too deep recursive calls, likely overflowing the 
recursion stack. 
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Algorithm 5.6: Framework of Pareto local search for multi-objective 
optimization. The interest of the method is to contain no parameter 


1 Neighborhood _evaluation 


Input: Solution s; neighborhood N(-) objective functions Ft) 
Result: Pareto set P completed with neighbors of s 

2 forall s’ € N(s) do 

3 Update_Pareto(s’, Fc )) 


4 Update Pareto 
Input: Solution s, objective values v 
Result: Updated Pareto set P 
if (s, V) either dominates a solution of P or P= Ø then 
From P, remove all the solutions dominated by (s, V ) 
P — PU (s, v) 

Neighborhood evaluation(s) 


w d 0 


Code 5.5 tsp 3opt pareto.py Implementation of Pareto local search for the TSP, based on 3-opt 
moves. This function calls another one (Code 12.6) that updates the Pareto set for each neighbor 
of the provided solution. The data structure used to store the Pareto set is discussed in Sect. 5.5.3 


iHHHHHHHHE Pareto local search for the TSP based on 3-opt neighborhood 
2| def tsp 3opt pareto(pareto, # Pareto front 
costs, # Tour length (for each dimension) 
Si; # Solution (successor of each city) 
d): # Distance matrix (one for each dimension) 
from kd tree update pareto import update 3opt pareto # Listing 12.6 
from kd tree add scan import K # Listing 12.4 
from random_generators import unif # Listing 12.1 
costs neighbor = [-1 for _ in range(K)] # Cost of neighbor solution 
start = unif(0, len(s)-1) # Starting city for move evaluation 
i, jy ks start, sS[start];, sileistart)!] # Indices of a 30pt move 
while True: 
for dim in range(K): 


costs neighbor[dim] = costs[dim] \ 


+ d[dim] [i] [s[j]] + d[dim] [j] [s[k]] + d[dim][k] [s [11] \ 
- d[dim] [i] [s[i]] - d[dim] [j] [s[j]] - d[dim] [k] [s [k]] 
Bal, sil, alki = #1), sikl; si] # Change solution to neighbor 
pareto = update 3opt pareto(pareto, costs neighbor, s, d) 
Silk; anle s AV = ejl Sx 8: [El # Back to solution 
k = s[k] # Next k 
Tek o4. 4 k at its last value, next j 
j = s[j]; k s[j] 
if k == i: # j at its last value, next i 
i = s[i]; j = s[i]; s[j] 
if s[s[i]] == start: # Neighborhood completely evaluated 
break 
return pareto 


5.5.3 Data Structures for Multi-Objective Optimization 


Both techniques presented above for multi-objective optimization can be relatively 
inefficient if no adequate data structure is used to store the Pareto set. Indeed, it 
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often requires a constant time to evaluate the objectives of a neighbor solution. This 
means a few nano-seconds on current computers. Checking for the domination of 
a solution can consume much more time than computing the objectives. Using a 
simple list to store the p solutions of the Pareto set may slow down the search by a 
factor proportional to p. 

Itis not uncommon for the Pareto set to contain thousands of solutions. Hence, an 
appropriate data structure for testing the dominance of a solution should not require 
a computational time growing linearly with the Pareto set size. 


5.5.3.1 Array 


Assuming a limited number of different integer values for the K objectives, a K — 1 
dimensional array is a simple and extremely efficient data structure to store the 
Pareto set. 

The size of this array in a dimension is given by the distinct possible values the 
corresponding objective can take. A cell of this array stores the value of the best 
solution found for the K th objective. For a bi-objective problem, we have a simple 
array. For instance, if we know that the first objective can vary from 2 to 13 and 
we have identified solutions with objectives (2, 27), (4, 24), (6, 23), (7, 21), and 
(11, 17), the array contains [27, 27, 24, 24, 23, 21, 21, 21, 21, 17, 17, 17]. After the 
discovery of the solution (5, 20), the array is updated as follows: 

[27, 27, 24, 20, 20, 20, 20, 20, 20, 17, 17, 17]. 

This data structure is limiting, because the objectives are not necessarily integers 
or do not involve a reasonable number of different values. However, if this data 
structure is usable in practice, it is incomparably fast, since it is possible to know 
the domination status of a solution in constant time. 


5.5.3.2 KD-Tree 


In the general case, a data structure whose query time is weakly growing with the 
number p of elements stored is the KD-tree. It is a binary search tree, where a 
node at the depth d discriminates the other elements of the tree on the dimension d 
mod K. Code 12.4 presents the basic procedures for including a new element in a 
KD-tree and inspecting all the elements stored in the tree. 

The removal of a given node is a tricky procedure to program for a KD-tree. 
Figure 5.13 gives an example of updating a KD-tree after the discovery of a new 
efficient solution. Unlike a single-dimensional binary tree, the rightmost node from 
the left subtree or the leftmost one of the right subtree can generally not replace 
the removed node. Indeed, a KD-tree discriminates on a different dimension at each 
level. So, a walk through both sub-trees is required to find the replacing node. The 
last is itself recursively replaced, until it is a leaf, simply eliminated. 

Code 12.5 implements a removal procedure of a given node within a KD-tree. 
Finally, to use a KD-tree to update a Pareto set, a function must find a point of the 
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Fig. 5.13 KD-tree update for a bi-objective cost/time problem. On the left, the tree contains an 
approximation of the Pareto frontier. After the discovery of the efficient solution 287/152, the 
dominated solution 298/217 must be searched and eliminated. To do this, we look for the solution 
with the highest time in its left subtree (307/147). The last replaces the node of the dominated 
solution. The old node 307/147 must be recursively replaced, here, by the node (319/137) with the 
lowest cost in its right subtree (because the left subtree is empty). On the right is the situation of 
the KD-tree after insertion of the new efficient solution 


tree that dominates an attempted solution, if any. For this, it is necessary to look if 
the KD-tree possesses a point between the ideal point, which is the one whose values 
on all dimensions are those of the optimum of the objectives, taken separately. 

In practice, the ideal point is unknown, but a coarse approximation is appropriate, 
for instance, (—oo,..., —oo), if all the objectives must be minimized. If the KD- 
tree contains points in the hyper-rectangle delimited by (—00,..., —oo) and the 
attempted solution, then these points dominate the trial solution. The last is thus 
ignored. Otherwise, the trial solution dominates others, which must be eliminated 
from the KD-tree. The trial solution is then added to the KD-tree. Code 12.6 allows 
updating a Pareto set when seeking to include a new point. 


Problems 


5.1 Local Minima 

Figure 5.2 shows the local optima of a function of a discrete variable x relative 
to a neighborhood consisting in changing x by one unit. On this figure, locate the 
local optima relative to an asymmetric neighborhood consisting in either adding 4 
or subtracting 3 from x. Does this neighborhood have the connectivity property? 


5.2 Minimizing an Explicit Function 

An integer function of integer variables [—7, 6] x [—6, 7] — [—10, 650] is explicitly 
given in Table 5.1. We seek the minimum of this function by applying a local search 
with the first improvement policy, starting from the solutions (6, 7) of value 650 and 
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Table 5.1 Integer function f (x, y) explicitly given 


a [125] 89 | ss | 97|105 | 109 | 129 [179 [209 246 302 [368 [458 [S25 
(3| 92| s| 70| 70| 7| 94| 98 14s |168 225 282 [339 [415 S10 
388 [486 
386 [454 
361 [Ais 
37i | 429 
[20 | 39 | 89 |118 156 212 [278 [368 | 436 
[19 | 32 | 89 [132 166 219 [290 [377 [435 
a o arty psp snm pepe ns o s 373 457 
Cs | e| 35] 32 | e| st] 56 | 75 [105 [15s 192 [248 [314 404 472 
Seot ss rot m st o o ias 68225 [399 41710 


from the solution (6, —6) of value of 510. It is assumed that the moves consist in 
modifying by a unit the value of a variable. The moves are checked in the order: 
(+1,0), (0, +1), (—1, 0), and (0, —1). Next, apply a local search with the best 
improvement policy, starting from the solution (—7, 7) of value 248 and from the 
solution (—7, —6) of value of 92. 


5.3 2-Opt and 3-Opt Neighborhood Properties 
Show the following properties of the 2-opt and 3-opt neighborhoods: 


* The inversion of two cities or a 3-opt move can be obtained by a succession of 
2-opt moves. 
* 2-opt and 3-opt neighborhoods have connectivity property. 


Provide an upper bound to the diameter of these neighborhoods. 


5.4 3-Opt for Symmetric TSP 

Section 5.1.3.1 introducing the 3-opt neighborhood shows a possibility to replace 
three arcs with three other arcs. This possibility respects the direction of travel of 
the three sub-paths located between the modified arcs. In the case of a symmetric 
problem where one accepts to change the direction of travel of some sub-paths, 
how many possibilities are there to replace three edges with three other edges while 
keeping an admissible tour? 
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5.5 4- and 5-Opt 

For an asymmetric TSP, how many ways are there to replace four arcs with four 
other arcs while maintaining the direction of travel of the sub-paths? Same question 
for replacing five arcs. 


5.6 Comparing 2-Opt Best and First 

How many moves should be tested to show that a TSP tour with n cities is 2-optimal? 
Empirically evaluate the number of repetitions of the external loop. Provide this 
number as a function of the problem size for both procedures tsp 20pt first 
and tsp 2opt best. Analyze the difference if the procedures start either with 
the nearest neighbor solution or with a random one. Consider examples of Euclidean 
problems, randomly, uniformly generated in a square. Explain the results. 


5.7 3-Opt Candidate List 

To limit the size of a TSP neighborhood, only the 40 shortest arcs incident to each 
city are considered. With such a limitation, what is the complexity of verifying if a 
solution is 3-optimal? Is a special data structure required to achieve this minimum 
computational complexity? Is the neighborhood generated by such a limitation 
connected? 


5.8 VRP Neighborhoods 

Suggest four different neighborhoods for the vehicle routing problem. Give the 
size of these neighborhoods as a function of the number n of customers and 
the number m of tours. Specify whether these neighborhoods have connectivity 
property, depending on the problem modeling. 


5.9 Steiner Tree Neighborhood 
Two solution modelings have been proposed in Sect.2.1.2 for the Steiner tree 
problem. Suggest neighborhoods adapted to each of these modelings. 


5.10 Ejection Chain for the VRP 

Propose a technique based on ejection chains for the vehicle routing problem. 
Specify how to initialize the chain, how to propagate it, and how to stop it. Estimate 
the computational complexity of evaluating an ejection chain for a solution with n 
customers and m tours. 
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Chapter 6 A 
Decomposition Methods gsti 


In the process of developing a new algorithm, this chapter should logically have been 
placed just after the one devoted to problem modeling. But, decomposition methods 
are only used when the data size to process is large. Thus, the phase is optional. The 
reader can glance it over before moving on to the following parts, devoted to the 
stochastic and learning methods. This is the reason justifying its place at the end of 
the first part of this book, devoted to the essential ingredients of metaheuristics. 


6.1 Consideration on the Problem Size 


The algorithmic complexity, very briefly exposed in Sect. 1.2.1, aims to evaluate the 
computational resources necessary for running an algorithm according to the data 
size it has to treat. We cannot classify the problems—large or small—only by their 
absolute size: sorting an array of 1000 elements is considerably easier than finding 
the optimal tour of a TSP instance with 100 cities. The time available to obtain a 
solution is clearly important: the perception of what a large instance is might not 
be the same if we have to perform a real-time processing in a few microseconds or 
a long-term planning for which a 1-day computation is perfectly convenient. Very 
roughly, we can put NP-hard problem instances in the following categories: 


Toy Instances Approximative size: n ^ 10. To ensure an algorithm works 
correctly, it is performed by hand. Another possibility is to compare its results 
to those of a method, easy to implement, but much less efficient. For instance, 
this can be an exhaustive enumeration of all solutions. Yet, we can empirically 
consider a computer is able to perform 10° elementary operations per second. If 
one has a time budget of this order of magnitude, one can consider an exhaustive 
enumeration for a permutation problem instance up to n 7 10; for a binary 
variable problem, we have n œ~ 20. Naturally, for polynomial algorithms, the 
instance size processed in one second varies from n œ% 50 for complexity in 


© The Author(s) 2023 131 
É. D. Taillard, Design of Heuristic Algorithms for Hard Optimization, Graduate 
Texts in Operations Research, https://doi.org/10.1007/978-3-031-13714-3 6 


132 6 Decomposition Methods 


O (n?) to n ~ 108 for linear complexity, passing through n ~ 10^ for quadratic 
complexity, and n ~ 10° for an algorithm in O(n log n). 

Small Instances Typical size: 10 $ n $ 102. When the size no longer allows 
an exhaustive enumeration of all solutions, we go into the category of small 
instances. We could characterize them by those for which we know robust 
algorithms that allow getting an optimal solution in a reasonable time. It should 
be mentioned that the literature frequently reports exact algorithms for solving 
examples of “difficult” problems of much larger size than those mentioned 
above. However, one should be careful with such statements: indeed, optimal 
solutions of traveling salesman or knapsack instances with tens of thousands of 
elements have been found, but much smaller instances are out of the scope of 
these programs. Small instances are useful for designing and calibrating heuristic 
methods. Knowing the optimal solutions allows determining the quality of 
heuristics and tuning the value of their parameters while maintaining reasonable 
computational times. 

Standard Instances Typical size: 10? ans 10*. This is the typical application 
area of metaheuristics. These are frequently encountered in real-world applica- 
tions. They are too large to be solved efficiently by exact methods or for a human 
to guess a good quality solution. The maximum instance size a metaheuristic can 
handle is related to its algorithmic complexity, whether in terms of computational 
time or memory. With more than 10^ elements, it becomes challenging to use a 
constructive method or a neighborhood size in O(n”). This is specially the case 
if one has to memorize an n x n matrix for efficiency reasons. The algorithmic 
complexity of a metaheuristic-based program is frequently larger than O (n°). 
Thus, many authors speak of a "large" instance for a size of 100. 

Large Instances Typical size: 10? Sng 105. Some real instances often have a 
higher number of items than standard instances, or they must be solved with less 
computational effort than a direct method would take. We can think, for instance, 
to vehicle routing for mail delivery or item labeling on a geographic map. For 
such problems, a size of 10? is not exceptional. In this case, decomposition 
methods must be used. This chapter presents some general techniques for 
approaching large instances. Let us mention that these techniques sometimes 
can advantageously be applied to smaller instances, even with just a few dozen 
elements. 

Huge Instances Size: n > 10? items. When the size of the problem exceeds 108 
to 10!? items, it is no longer possible to completely store the data in RAM. In 
this case, it is necessary to work on parts of the instance, usually using parallel 
algorithms to maintain adequate processing times. The treatment of this type of 
instances essentially raises mainly technical issues and is beyond the scope of 
this book. 
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6.2 Recursive Algorithms 


When a large instance has to be solved with limited computational effort, it is cut 
into small parts, independently solved. Finally, they are put together to reconstruct 
a solution to the complete problem. An efficiency gain is only possible with such 
a technique by the conjunction of several conditions: directly solving the problem 
requires a computational effort more than linear; otherwise, a decomposition only 
makes sense for a parallel computation. The parts must be independent of each 
other. Combining the parts together should be less complex than directly solving 
the problem. The difficulty lies in how to define the parts: they must represent a 
logical portion of the problem so that their assembly, once solved, is simple. 

The merge sort is a typical decomposition algorithm. A list to sort is split into 
two roughly equal parts. These are sorted by two recursive calls, if they contain more 
than one element. Finally, two locally sorted sub-lists are scanned to reconstruct a 
complete sorted list. 


6.2.1 Master Theorem for Divide-and-Conquer 


In many cases, the complexity of a recursive algorithm can be assessed by the 
divide-and-conquer master theorem. Suppose the time to address a problem of size 
n is given by T(n). The algorithm proceeds by splitting the data into b parts of 
approximately identical size, n/b. Among them, a are recursively solved. Next, 
these parts are combined to reconstruct a solution to the initial problem, which 
requires a time given by f(n). To assess the complexity of such an algorithm, 
we must solve the functional equation T (n) = a - T(n/b) + f (n) whose solution 
depends on the reconstruction effort. 

Introducing e, a positive constant forcing the function f (n) to be either smaller 
or larger than nl?» (9, the master theorem allows deducing the complexity class of 
T (n) in some case: 


* If f (n) = O(nlo&(9—6), then T (n) = O (nlo&(), 

* If f (n) = O(n), then T (n) = O (nl^£^ . log n). 

© If f (n) 2 Q (nlo&()**) and if a - f (n/b) < c- f (n), with c < 1, constant, then 
T (n) = O(f (n). 


Often, a — b: we have a recursive call for all parts. In this case, the theorem states 
that, if the reconstruction can be done in a sublinear time, then we can deal with the 
problem in linear time. If the reconstruction takes a linear time—which is typically 
the case for sorting algorithms—then the problem can be solved in O(n logn). The 
last case simply indicates all the difficulty of the algorithm is concentrated in the 
reconstruction operations. Finally, let us mention that the theorem does not cover all 
cases for the function f (n). 
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There are also cases where a Z b. An example is a query for a point of the 
Euclidean plane from a set of n points stored in a balanced 2D-tree (see the data 
structure discussed in Sect. 5.5.3.2). With such a data structure, one can halve the 
number of points remaining to be examined by processing a maximum of a — 2 
parts among b = 4. Indeed, unlike a binary tree in one dimension, we cannot ensure 
to divide this number by two at every single level of the tree but only every two 
levels. Since this is a query problem, there are no reconstruction and f (n) = O(1). 
As log4(2) = 1/2, we can choose e = 1/2, and we are in the first case. We can 
deduce that the complexity of a query in a 2D-tree is in O (n!/?). However, if the 
points are well spread, the empirical behaviour is better, closer to log n. 

Heuristic algorithms proceeding by recursion commonly stop prematurely, 
before the part size is so small that its resolution becomes trivial. Even if the 
parts are exactly solved, the reconstitution phase does not generally guarantee 
optimality. Hence, both cutting and reconstitution procedures are heuristics. This 
means that the "border areas" between two parts are, more or less obviously, not 
optimum. To limit this effect of sub-optimality, it is necessary to assemble as few 
parts as possible, while being able to process them. Indeed, if they are excessively 
large, their exact resolution requires too much time, or the heuristics may produce 
low-quality parts. 


6.3 Low Complexity Constructive Methods 


Solving large instances implies limiting the complexity of the constructive method 
for generating an initial solution. This means that even the most basic greedy method 
is not appropriate. If the function c(s, e) that provides the cost of the addition of an 
element e actually depends on the partial solution s, then its complexity is in 2 (12). 
Indeed, before including one of the n elements, it is necessary to evaluate c(s, e) for 
all the remaining elements. A random construction in linear time is not suitable, due 
to the bad quality of the solution produced. 

It is therefore necessary to "cheat," making the hypothesis that not all the 
elements of the problems have a direct relationship with all the others. Put 
differently, an element is in relation with a relatively limited number of other 
elements, and this relationship possesses a certain symmetry. It is reasonable to 
make the hypothesis that it is possible to quantify the proximity between two 
elements. In such a case, we can avoid complexity in O(n?) by sampling and 
recursion. We can limit the phenomenon of sub-optimality due to the assembly of 
parts by stopping the recursion at the first or the second level. 
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6.3.1 Proximity Graph Construction 


There are relatively good automatic classification heuristics to partition a problem 
of size n into k groups. The fast variant of Algorithm 2.7 (k-medoids) mentioned in 
Sect. 2.7.2 achieves such a heuristic partition with complexity of O(k-n+ (q».! 

This complexity can be minimized by choosing k = y/n. Thus, it is possible 
to partition a problem of size n in y/n parts, each comprising approximately ./n 
elements. Performing the clustering on a random sample of the elements (e.g., 
©(./n)) can significantly speed up the procedure. This method is illustrated in 
Fig. 6.1. 

It is possible to get a decomposition with smaller clusters by applying a second 
recursion level: the instance is first cut into a large parts of relatively similar size 
as presented above. A proximity relationship is defined between large part, so that 
each includes O (1) neighbors. A rudimentary proximity definition is as follows: if 
an element has c; as its nearest center and c; as its second nearest, then c; and c; are 
considered as neighbors. Each large part is then partitioned into b small clusters. 

Similarly, a proximity relationship is defined between small clusters. A small 
cluster is related to all those belonging to the large part of which it belongs. By 
choosing a = O (n) and b = O (An), we get a decomposition into a number of 
small clusters proportional to n, whose size is approximately identical. The overall 
algorithmic complexity is O (n?/?). 

For some problems, it can make sense. Indeed, for the vehicle routing problem, 
the maximum number of customers that can be placed on a tour depends on the 
application (home service, parcel distribution, rubbish collection) and not on the 
total number of customers of the instance. 

This decomposition technique is illustrated in Fig. 6.2. Bold lines show proximity 
relations between large parts. The small clusters obtained by decomposition of 
large parts contain about 15 elements. The elements of large parts are represented 
by points of the same color. By exploiting such a decomposition and proximity 
relationships, it becomes possible to efficiently generate a solution to a large 
problem instance. A computational time of about one second was enough to obtain 
the structures of Fig. 6.2, with more than 16,000 entities. 


l The notation O(-) cannot be used here because it is assumed that the k groups contain 
approximately the same number of elements. In the worst case, a few groups could contain 
O(n) elements, and © (n) groups could contain O(1) elements, which would imply a theoretical 
complexity in O (n?). To limit the complexity, the algorithm must be stopped prematurely, if 
needed, by repeating a constant number of times the external loop. 
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(c) Centers | (d) Clusters 


Fig. 6.1 Illustration of the method for partitioning a problem instance: from a complete set of n 
elements of the instance (a), a random sample is selected (b). Algorithm 2.7 is run on the sample 
and k = O (yn) medoids are identified (c). All the n elements are allocated to the closest medoid 
(d) 


6.3.2  Linearithmic Heuristic for the TSP 


Itis possible to extend this decomposition principle to a number of levels depending 
on the instance size and thus get an O(n logn) algorithm. This section illustrates 
the principle on the Traveling Salesman Problem. Rather than reasoning on the 
construction of a tour, we build paths passing through all the cities of a given subset. 

It is actually straightforward to adapt Code 12.3 so that it is able to treat a 
path rather than a tour. An algorithm to optimize a path can equally be used to 
provide a tour. Indeed, a TSP tour can be seen as a path starting by city c; € C 
and ending by cj. If we have a problem with n cities, the path P = (b = 
Cj, C1, +++) Ci-1, Cil, -.., Cg, Cj = €) defines a feasible (random) tour. The path 
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Fig. 6.2 Two-level decomposition of a problem instance. The elements are clustered into © (y/n) 
large parts of approximately identical size. Bold lines show the proximity relationship between 
large parts. The latter are themselves decomposed into © (y/n) small clusters. The complexity of 
the process is in O (n?/?). It can be applied to non-geometrical problems 


P is either directly optimized if it does not contain too many cities, or decomposed 
into r sub-paths, where r is a parameter that does not depend on the problem size. 
To fix the ideas, the value of r is typically between 10 and 20. If n. < r?, a very good 
path passing through all the cities of P, starting in c; and ending in c; can be found, 
for example, with an ejection chain or even an exact method. This feasible tour is 
returned by the heuristic. 

Else, if n > r?, the path P is reordered by considering r sub-paths. This is 
performed by choosing a sample S of r cities by including: 


e u €C \ {b, e}, the city closest to b 
e v €C \ {b,e, u}, the city closest to e 
e r — 2 other cities of C V {b, e, u, v} randomly picked 


A good path Ps through all the cities of sample S, starting at city b and ending at 
city e, can be found with a local search or an exact method. Let us rename the cities 
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Fig. 6.3 Recursive TSP tour construction. Left: path on a random sample (bold and light line) and 
reordered path Ps completed with all cities before recursive calls of the procedure. Paths P, to Ps 
are drawn with different colors and different backgrounds highlight them. Right: the path P; was 
recursively decomposed into r — 5 pieces. Final state with all sub-paths optimized 


of S so that Ps = b,51,52,..., 5,1, Sr, €. Path Ps can be completed to contain all 
the cities of C by inserting them, one after the others, just after the closest city of 
S. So, the completed path Ps = (b, 51,...,52,...,, 57, ..., €) improves the initial 


path P. The left side of Fig. 6.3 illustrates this construction using a sample of r — 5 
cities. The shaded areas highlights the first r sub-paths found. 

At this step, the order of the cities in the completed path Ps between two cities s; 
and s; + 1 is arbitrary (as it was for P at the beginning of the procedure). The sub- 
paths Pi = (b = xu ved = Cig) peg = is et = Sr41) C 
Ps can be further improved with r recursive calls of the same procedure, where 
si is the city just preceding the first one of the path P;. The right side of Fig. 6.3 
illustrates the solution returned by this recursive procedure. 

It can be noted in this figure that only the sub-path P; has been decomposed. The 
others, not comprising more than r? cities, were directly optimized by a procedure 
similar to that given by Code 12.3. The solution finally obtained is not excellent, 
but it was obtained very quickly and is suitable as an initial solution for partial 


improvement techniques, like POPMUSIC, which will be detailed in Sect. 6.4.2. 


6.4 Local Search for Large Instances 


After reviewing some techniques for constructing solutions for large instances, 
let's now take a look at some techniques for improving them. LNS, POPMUSIC, 
and Corridor assume that an initial solution to the problem is available. These 
techniques are sometimes called fix-and-optimize [4] or, more recently, magnifying 
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glass heuristics [3]. The key idea is to fix a relatively large portion of the problem 
variables and to solve a sub-problem with additional constraints on the remaining 
variables. When a heuristic includes an exact optimization method, we now speak 
of matheuristic. 


6.4.1 Large Neighborhood Search 


Large neighborhood search (LNS) has been proposed by Shaw [5]. The general idea 
is to gradually improve a solution by alternating destruction and repair phases. To 
illustrate this principle, let’s consider the example of integer linear programming. 
The destruction phase involves selecting a subset of variables while incorporating 
some randomness into the process. In its simplest form, this consists in selecting 
a constant number of variables, in a completely random fashion. A more elaborate 
form is to randomly select a seed variable and a number of others, which are most 
related to the seed variable. The repair phase consists in trying to improve the 
solution by solving a sub-problem on the variables that have been selected. The 
value of the other variables being set to the one taken in the starting solution. 

The name of this technique comes from the fact that a very large number of 
possibilities exist to reconstruct a solution. This number exponentially increases 
with the size of the sub-problem, meaning that they could not reasonably be 
extensively enumerated. Thus, the reconstruction phase consists in choosing a 
solution among a large number of possibilities. As the significant part of the 
variables preserves their value from one solution to the next, it is conceptually a 
local search but with a large neighborhood size. The framework of LNS is provided 
by Algorithm 6.1. 


Algorithm 6.1: LNS framework. The destroy, repair, and acceptance func- 
tions must be specified by the programmer, as well as the stopping criterion 


Input: Solution s, destroy method d(-), repair method r(-), acceptance criterion a(-,-) 
Result: Improved solution s* 
Sos 
repeat 
s' — r(d(s)) 
if a(5, 5") then 


sas! 


nu & Bee 


6 if s’ better than s* then 


S* — s 


s until a stopping criterion is satisfied 
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This frame leaves considerable freedom for the programmer to select various 
options: 


Destroy method d(-) This method is supposed to destroy part of the current 
solution. The authors recommend that it is not deterministic, so that two 
successive calls destroy various portions. Another vision of this method is to fix 
a certain number of variables of the problem and release the others, which can be 
modified. This method additionally includes a parameter that allows modulating 
the amount of destruction. Indeed, if the number of independent variables is 
too small, the repair method has too many constraints to be able to differently 
reconstruct the solution, and the algorithm is not able to improve the current 
solution. Conversely, if the number of independent variables is too large, the 
repair method may encounter difficulties in improving the current solution. This 
is peculiarly true if an exact method is used, implying a prohibitive computational 
time. 

Repair method r(-) This method is supposed to repair the part of a solution that 
was destroyed. Another vision of this method is to re-optimize the portion of the 
problem corresponding to the variables that were freed by the destroy method. 
One possible option for the repair method is to use an exact method, for instance, 
constraint programming. Another option is to use a heuristic method, either a 
simple one, like a greedy algorithm, or a more advanced one, such as taboo 
search, variable neighborhood search, etc. 

Acceptance criteria a(-,-) | The simplest acceptance criterion is to use the fitness 
function value of both solutions provided as parameters: 


n _ | True If s' better than s 

di | False Otherwise 
Other criteria have been proposed, for instance, those inspired by simulated 
annealing (see Sect. 7.1). 

Stopping criterion The framework does not provide any suggestion for the stop- 
ping criterion. Authors frequently use the limit of their patience, expressed in 
seconds. Also, it can be the patience of other authors who have proposed a 
concurrent method! This kind of stopping criteria is hardly convincing. This point 
is discussed further in Sect. 1 1.3.4.2. The quite close POPMUSIC framework, 
presented in Sect. 6.4.2, incorporates a natural stopping criterion. 


To illustrate a practical implementation of this method, let us consider those 
of Shaw [5], originally adapted to the vehicle routing problem. The destroy 
method selects a seed client at random. The remaining customers are sorted using 
a function measuring the relationship with the seed customer. This function is 
inversely proportional to the distance between customers and depends on whether 
the customers are part of the same tour. The idea is to select a subset of customers 
who are close to the seed one but from different routes. These clients are randomly 
selected, with a bias to favor those most closely related to the seed client. 
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we Ge ag 


(a) Initial solution (b) Destroyed solution (c) Repaired solution 


Fig. 6.4 Illustration of LNS on a VRP instance. The initial solution (a) is destroyed by removing a 
few customers (b). The destroyed solution is repaired by optimally inserting the removed customers 


(c) 


The repair method is based on integer linear programming. The method imple- 
ments a branch and bound technique with constraint propagation. This method can 
only modify the variables associated with the clients chosen by the destruction 
method. In addition, to prevent the explosion of computational times, common with 
exact methods, the enumeration tree is partially examined and heuristically pruned. 
A destroy-repair cycle is illustrated in Fig. 6.4. 

There are algorithms based on the LNS framework that have been proposed 
well before it. Among these applications is the shifting bottleneck heuristic for 
the jobshop scheduling problem [1]. In this article, the destroy method selects the 
bottleneck machine and frees the variables associated with the operations processed 
by this machine. The repair method reorders these operations, considering that 
the sequences on other machines are not modified. Hence, each operation on the 
bottleneck machine has a release time corresponding to the earliest finishing time of 
the preceding operation on the same job. In addition, each operation has a due date, 
corresponding to the latest starting time of the following operation on the same job. 
In this heuristic, all choices are deterministic and all optimization are exact. So, 
the current solution is modified only if it is strictly improved and the method has a 
natural stopping criterion. 

The POPMUSIC method presented in the following section was developed 
independently from LNS. It can be seen as a less flexible LNS method, in the 
sense that it better suggests to the programmer the choice of options, particularly 
the stopping criterion. 


6.4.2 POPMUSIC 


The primary idea of POPMUSIC is to locally optimize a part of an existing solution. 
These improvements are repeated until no part that can be optimized are detected. 
It is, therefore, a local search method. Originally, this method received the less 
attractive acronym of LOPT (for local optimizations) [8, 9]. 
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For large problem instances, one can consider that a solution is composed of 
a number of parts, which are themselves composed of a number of items. Taking 
the example of clustering, each cluster can be a part. In addition, it is assumed 
that one can define a proximity measure between the parts and that the latter are 
somewhat independent of each other in the solution. In the case of clustering, 
there are closely related clusters, containing items that are not well separated, 
and independent clusters, that are clearly well separated. If these hypotheses are 
satisfied, we have the special conditions necessary to develop an algorithm based on 
the POPMUSIC framework. The name was proposed by S. Voß. It is the acronym 
of Partial OPtimization Metaheuristic Under Special Intensification Condition. 

First, let us assume that a solution s can be represented by a set of q parts 
S1, ..., Sq, and next that we have a method for measuring the proximity between 
two parts. The germinal idea of POPMUSIC is to select a seed part s; and a number 
r < q of the parts the nearest to sg to build a sub-problem R. With an appropriate 
definition of the parts, improving the sub-problem, R can reveal an improvement for 
the complete solution. Figures 6.5 and 6.6 illustrate what a part and a sub-problem 
can be for various applications. 


Independent parts 


Sub-problem 


Seed-part 


Fig. 6.5 To apply the POPMUSIC framework to a clustering problem, one can define a part as all 
the items assigned to the same center. The parts the nearest from the seed cluster constitute a sub- 
problem that is tentatively optimized independently. The optimization of well separated clusters 
cannot improve the solution. Hence, these parts are de facto independent 
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Fig. 6.6 For the VRP, the definition of a part in POPMUSIC can be a tour. Here, the proximity 
between tours is the distance of their center of gravity. A sub-problem consists of customers 
belonging to six tours 


To prevent optimizing the same sub-problem several times, a set U stores the 
seed parts that can define a sub-problem potentially not optimal. If the tentative 
optimization of a sub-problem does not lead to an improvement, then the seed part 
used to define it is removed from U. Once U is empty, the process stops. If a sub- 
problem R has been successfully improved, a number of parts have been modified. 
New improvements become possible in their neighborhood. In this case, all parts of 
U that no longer exist in the improved solution are removed before incorporating all 
parts of R. Algorithm 6.2 formalizes the POPMUSIC method. 

To transcribe this framework into a code for a given problem, there are several 
options: 


Obtaining the initial solution  POPMUSIC requires a solution before starting. The 
technique presented in Sect.6.3 suggests how to get an appropriate initial 
solution with limited computational effort. However, POPMUSIC may also work 
for a limited instance size. In this case, an algorithm with a higher complexity 
can generate a starting solution. 
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Algorithm 6.2: POPMUSIC framework 


Input: Initial solution s composed of q disjoint parts s1,..., $q; sub-problem improvement 
method 
Result: Improved solution s 
1 U — (915555) 
2 while U Z Ø do 
3 Select sg € U // sg: Seed part 
Build a sub-problem R composed of the r parts of s the closest to Sg 
Tentatively optimize R 
if R is improved then 
Update s 
From U, remove the part no longer belonging to s 
In U, insert the parts composing R 
10 else R not improved 
" L Remove sg from U 


Definition of a part The definition of a part is not unique for a given problem. In 
the VRP case, we can consider that all customers on the same tour form a part, 
as was done in [2, 7] (see Fig. 6.6). For the same problem, it is equally possible 
to define a part as a single client, as in [5]. 

Definition of the distance between parts For some problems, the definition of 
distance between two parts can be relatively easy and logical. For example, [9] 
uses the Euclidean distance between centroids for a clustering problem. For map 
labeling (Sect. 3.3.3), a graph is built whose vertices represent the objects to be 
labeled and the edges represent potentially incompatible label positions. The 
distance is measured by the minimum number of edges of a path to the seed 
label, as shown in Fig. 6.7. 

By cons, this definition can be quite unclear for some problems. For the VRP with 
time window, two geometrically close clients can have incompatible opening 
time windows. Therefore, they should be considered as distant. 

It is possible to use several different proximity definitions simultaneously. If we 
take the problem of school timetable design, one definition may aim to create 
sub-problems focusing on groups of students following the same curriculum, 
another on teachers, and a third on room allocation. Of course, if several 
definitions of proximity between parts are used simultaneously, the last line of 
Algorithm 6.2 has to be adapted: a seed part sọ will only be removed from U if 
none of the sub-problems that can be created with s, can improve the solution. 

Selection of the seed part To our knowledge, there are no comprehensive studies 
on the impact of the seed part selection process. In the literature, only very simple 
methods are used to manage the set U: either stack or random selection. 
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Fig. 6.7 For map labeling, a part can be an object to be labeled (circle with a number). Here, we 
consider four possible label positions (rectangles around each object). Two objects are at a distance 
of one if their labels may overlap. The number inside each disc represents the distance from the 
seed object, noted 0. A sub-problem has up to r = 25 objects which are the closest to the seed 
object. Here, the distance is at most 4. The objects whose labels could collide with these r objects 
are included in the sub-problem. Only the positions of the labels of the r objects can be changed 
when optimizing a sub-problem 


Parameterr The size of the sub-problems depends on r, the only explicit param- 
eter of POPMUSIC. It depends on the ability of the optimization method. A low 
value only allows minor improvements, but it requires a limited computational 
effort. A high value implies a high computational effort but a better potential to 
improve the solution. 

Sub-problem optimization method The programmer is free to select any sub- 
problem optimization method. Since the sub-problem size can be adjusted, the 
implementation is facilitated: the method should be efficient for a limited span 
of instance size. In case the optimization method is an exact one, POPMUSIC 
framework is a matheuristic. 


Looking at the stopping criterion—the set U is empty—the computational effort 
could potentially be prohibitive for large instances. Indeed, for each sub-problem 
improvement, several parts are introduced in U. In practice, the number of sub- 
problems to solve grows almost linearly with the instance size. Figure 6.8 illustrates 
this for a location-routing problem [2] and Fig. 6.10 for the TSP. 


6.4.2.1 POPMUSIC for the TSP 


An elementary implementation of the POPMUSIC technique for the traveling 
salesman problem is given by Code 6.1. In this adaptation, a part is a city. The 
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Fig. 6.8 Computational time observed for creating an initial solution to a location-routing problem 
with the technique presented in Sect.6.3 and overall optimization time for sub-problems with 
the POPMUSIC frame. We notice that the growth of the computation time seems lower than the 
analysis of © (n?/?) done in Sect. 6.3 and that the time for the optimization of the sub-problems is 
almost linear 


distance between parts is measured by the number of intermediate cities that there 
are along the current tour. This contrasts with a measure using the distance matrix. A 
sub-problem is, therefore, a path of 2r cities whose extremities are fixed. We seek to 
move a sub-path of at most r cities in the sub-problems, using a 3-opt neighborhood. 
The set U is not represented explicitly because it is identified to the tour. Indeed, 
successive sub-problems are just defined by a single city shift. To determine whether 
to continue to optimize, the initial city of the last sub-path that was successfully 
optimized is stored. If all starting cities are tried without improvement, the process 
stops. 
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Code 6.1 tsp 3opt limited.py Basic POPMUSIC implementation for the TSP 


1| ####H#HHHH POPMUSIC for the TSP based on 3-opt neighborhood 
2| def tsp 3opt limited (d, # Distance matrix 


1s 4 Subproblem size 
4 succ, # Tour provided and returned 
5 length): # Tour length 
6 len (succ) 
Pia com 2: 4 Subproblem size must not exceed n - 2 
8 In 2 


9 i last i = 0 # starting city is index 0 
10 while True: 

11 j = succ [i] 

12 t=0 

13 # do not exceed subproblem and the limits of the neighborhood 

14 while t « r and succ[succ[j 
15 k = sucet] 
16 u= p 

17 while u < r and succ [k] 
18 delta = d[i] [succlj + d[jl] [succ[k]] + d[k] [succ[i]] \ 

19 -d[i] [succ[i]] - d[j] [succ[j]] - dik] [succ [k] 

20 if delta « 0: 4 Is there an improvement? 
21 length += delta # Perform move 
22 Bucc [i]; suceji; succd[kE] = sucec[jl.; suce [kl]; succi] 

23 j j # Replace j between i and k 


26 # Next k 


28 j = succ[j] # Next j 
29 i = succ [i] # Next i 


31 if i == Jast i; # A complete tour scanned without improvement 
32 break 


34 return succ, length 


In order to successfully adapt the POPMUSIC technique to the TSP, it is 
necessary to pay attention to some issues: 


* The initial solution must already possess an appropriate structure; for a Euclidean 
problem, it should not include two intersecting edges belonging to portions of 
routes that are separated by a long sequence of cities, because the optimization 
procedure will be unable to uncross them. 

* Rather than developing an ad hoc local search like the one in Code 6.1 to 
optimize sub-paths, it is easier to use a general TSP solving method, for instance, 
Code 12.3. 

* Ultimately, we must avoid optimizing a second time a sub-path that was already 
optimized. 


To start with a solution having an appropriate structure, without using an 
algorithm of high complexity, we can go along the lines of the technique presented 
in Sect. 6.3.2. As the empirical complexity of POPMUSIC is linear, one can obtain 
a solution of satisfactory quality in n logn [10]. In practice, the time to build an 
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Fig. 6.9 On the right, independent optimizations of four sub-paths. The bold lines highlight the 
tour after optimization. The thin lines are those of the initial tour. On the left, the tour is shifted 
and the process is repeated 


initial solution is negligible compared to its improvement with POPMUSIC, even 
for instances with billions of cities. We can speed up the process as follows, without 
significantly degrading the final solution: the route over n cities is cut into [n/r] 
sub-paths of approximately r cities. These sub-paths are connected only by their 
extremities. Therefore, they can be independently optimized. 

Once all these paths have been optimized, the tour is shifted by r/2 cities. Finally, 
[n/r] sub-paths overlapping the previous ones are optimized. Thus, with 2 - [n/r] 
sub-paths optimizations, we get a relatively good tour. Figure 6.9 illustrates this 
process on the small instance solution shown in Fig. 6.3. 

Figure 6.10 gives the evolution of the computational time as a function of the 
number of cities. Figure 6.11 measures the quality of the solutions that can be 
obtained with these techniques. Interestingly, the greedy nearest neighbor heuristic 
(Code 4.3) would have provided, in a few 10 years or a few centuries for a billion 
city instance, a solution deviating by about 2296 from the optimum. 


6.4.3 Comments 


The chief difference between LNS and POPMUSIC is the latter unequivocally 
defines the stopping criterion and the neighbor solution acceptance. Indeed, POP- 
MUSIC accepts to modify the solution only if we have a strict improvement. For 
several problems, this framework seems sufficient to obtain good quality solutions, 
the latter being strongly conditioned by the capacity of the optimization method 
used. The philosophy of POPMUSIC is to keep a framework as simple as possible. 
If necessary, the optimization method is improved so that it can better address 
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Fig. 6.10 Computational times for building an initial TSP solution with the technique presented in 
Sect. 6.3.2. Optimizing it with a fast POPMUSIC (sub-paths of 225 cities). Building a solution with 
the nearest neighbor heuristic. Building a tour with one level recursion method (see Problem 6.4). 
Optimizing a tour with a standard POPMUSIC (sub-paths of 50 cities). The increase in time for 
building a solution, in n logn, is higher than that of optimizing it with POPMUSIC. However, the 
last takes a higher time for an instance with more than 2 billion cities 
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Fig. 6.11 Quality of the solutions obtained with the constructive method presented in Sect. 6.3.2 
and once improved with fast POPMUSIC. Quality of the nearest neighbor heuristic and those of 
a standard POPMUSIC starting from an initial solution obtained with a single recursion level. 
The problem instances are generated uniformly in the unit square, with toroidal distances (as if 
the square was folded so that opposite borders are contiguous). For such a distance measure, a 
statistical approximation of the optimal solution length is known. The fluctuations for the initial 
solution reflect the recursion levels 


150 6 Decomposition Methods 


larger sub-problems. So, the framework is kept simple, without adding complicated 
stopping criteria. 

Defining parts and their proximity in POPMUSIC is perhaps a more intuitive way 
than in LNS to formalize a set of constraints that are added to the problem on the 
basis of an existing solution. These constraints allow using an optimization method 
that would be inapplicable to the complete instance. The Corridor Method [6] takes 
the problem from the other end: given an optimization method that works well— 
in their application, dynamic programming—how can we add constraints to the 
problem so that we can continue to use this optimization method. The components 
or options of a method are often all interdependent. Choosing one option affects the 
others. It may explain why actually very similar methods are presented by different 
names. 


Problems 


6.1 Dichotomic Search Complexity 

By applying the master recurrence theorem (Sect. 5.2.1), determine the algorithmic 
complexity of searching for an element in a sorted array by means of a dichotomic 
search. 


6.2 POPMUSIC for the Flowshop Sequencing Problem 

For implementing a POPMUSIC-based method, how to define a part and a sub- 
problem for the flowshop sequencing problem? How to take into account the 
interaction between the sub-problem and parts that should not be optimized? 


6.3 Algorithmic Complexity of POPMUSIC 

In a POPMUSIC application, the size of the sub-problem is independent of the size 
of the problem instance. Hence, any sub-problem can be solved in a constant time. 
Empirical observations, like those presented in Fig. 6.8, show that the number of 
times a portion is inserted in U is also independent of the instance size. In terms of 
algorithmic complexity, what are the most complex steps of POPMUSIC? 


6.4 Minimizing POPMUSIC Complexity for the TSP 

A technique for creating an appropriate TSP tour is as follows: first, a random 
sample of k cities among n is selected. A good tour on the sample is obtained with a 
heuristic method. Let us suppose that the complexity of this method is O (k^), where 
a is a constant larger than |. Then, for each of the remaining n — k cities, we find 
the nearest from the sample. In the partial tour, each remaining city is inserted (in 
any order) just after the sample city identified as the nearest. Finally, sub-paths of 
r cities of the tour thus obtained are optimized with POPMUSIC. The value of r is 
supposed to be in O(n/k). Also, it is supposed that the total number of sub-paths 
optimized with POPMUSIC is in O(n). The paths are optimized using the same 
heuristic method as for finding a tour on the sample. 
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Fig. 6.12 TSP tour partially optimized with POPMUSIC. The initial tour is obtained with a one- 
level recursive method. The tour on a sample of the cities in bold 


Figure 6.12 illustrates the process. The sample size k depends on the number 
of cities. We suppose that k = O (n^), where h is to be determined. The sub-paths 
optimized with POPMUSIC have a number of cities proportional to n/k. Determine 
the value of h(a) that minimizes the global complexity of this method. 
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Part III 
Popular Metaheuristics 


This part reviews about 20 popular metaheuristics. They are grouped by the core 
of the basic heuristic principle they exploit. The first group includes those relying 
exclusively on random components. Second are methods that attempt to learn how to 
build new solutions. This is followed by methods that learn how to modify solutions 
with a local search. Finally, there are methods that exploit a population of solutions. 
Table 1 lists the metaheuristics presented in this part. 

The last chapter provides tips for designing new heuristics. 
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Table 1 Metaheuristics that are addressed in the third part with a brief description of their 
operating principles 


Chapter Operating principles 


7. Randomized methods Simulated annealing Biased random local 


search 


Late acceptance hill Local search + history 

climbing 

Variable neighborhood Several 

search neighborhoods 

GRASP Biased random 
construction - local 
search 


8. Construction learning MIN-MAX ant system Construction with 
learning + local 
search 


construction learning 
9. Local search learning Local search + 

memory 

various memories 


10. Population Genetic algorithm Simple population 
management evolution 


Memetic algorithm 


Advanced population management 
+ local search 


Particle swarm Interactions between 
continuous solutions 


Electromagnetic method 


Chapter 7 A) 
Randomized Methods Geek for 


By applying the principles presented in the previous chapters, we can build a 
solution and improve it to find a local optimum. In addition, if the problem is 
complicated or large, we have seen how to decompose it into sub-problems easier 
to solve. 

What if, using these techniques, we obtain solutions whose quality is not good 
enough? Let us suppose that we can devote more computational time to finding 
better solutions. The first option coming to mind is to try to introduce a learning 
process. That is the subject of subsequent chapters. 

A second option—a priori simpler to implement—is to incorporate random 
components into an “improvement” method where we allow the choice of lower 
quality solutions than that of departure. Although we no longer have a strict 
improvement at each iteration, this is a local search since such methods are based 
on locally modifying solutions. Very similar methods are based on this second 
option: simulated annealing (SA), threshold accepting (TA), great deluge, demon 
algorithms, and the noising method. Interestingly, the latter framework can be seen 
as a generalization of the previous methods. The late acceptance hill climbing 
shares similarities with these methods but incorporates a self-parameter tuning. The 
variable neighborhood search (VNS) alternates intensification and diversification 
phases. It improves a solution with a basic neighborhood and degrades it by 
randomly selecting moves in other neighborhoods. 

A third option is to repeat constructions with random choices, possibly followed 
by local improvements. This is the way followed by the greedy randomized adaptive 
search procedure (GRASP). 
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156 7 Randomized Methods 
7.1 Simulated Annealing 


The simulated annealing method is one of the first local search techniques that does 
not strictly improve the quality of the solution at each iteration. This method is 
inspired by a process of physics, the annealing, which minimizes the internal energy 
of the molecules of a material. Some materials, like metals, have their internal 
structure modified, depending on the temperature to which they are heated. By 
rapidly cooling the material, the molecules do not have time to arrange to achieve 
the usual structure at low temperatures but forms grains which are small crystals 
whose orientation is different for each grain. This is the quenching process, which 
is used in particular to harden some steels. 

On the contrary, if the cooling is slow, the molecules manage to form crystals 
much larger, corresponding to their minimum energy state. By repeating the method, 
one can further increase the size of the crystals or even obtain a monocrystal. This 
is the annealing process. These two processes are illustrated in Fig. 7.1. 
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Fig. 7.1 Annealing and quenching processes. The material is carried to such a temperature that its 
molecules have enough energy to move. By cooling it slowly, the molecules have time to reach a 
crystalline state, minimizing their energy. This is the annealing process. If the cooling is faster, this 
is the quenching process, and disordered crystals are formed. Under certain conditions, a very fast 
cooling does not allow time to for crystals to form and the material remains in an amorphous state 
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Cerny [1] and Kirkpatrick et al. [8] independently had the idea to simulate this 
process in combinatorial optimization, making the analogy between the objective 
function to minimize and the energy of the molecules. At high temperatures, 
a molecule has enough energy to fill a gap in the crystal lattice or change its 
configuration. However, at low temperatures, it has a significantly lower probability 
of doing so. Translating this in terms of combinatorial optimization means changing 
a solution locally and randomly and accepting its degradation with a certain 
probability. The latter must be low if the degradation is significant. 

Expressed in terms of local search, this corresponds to generating a random move 
m € M and calculating the cost difference A between the initial and modified 
solution: A = f(sQm)— f (s). If A < 0, the move m improves s, and it is accepted. 
The new solution becomes s © m. Else, the move m can eventually be accepted, 
with a probability proportional to e~4/7, where T is a parameter simulating the 
temperature. At each step, the temperature T' is diminished. Several formulas have 
been proposed to adjust the temperature. Among the most frequently encountered, 
T < a-T and T <— IET: where 0 < o < 1, is the parameter adjusting 
the decreasing speed of the temperature. The method comprises at least two other 
parameters: 7;;j; and 72,4 the initial and finishing temperatures. Algorithm 7.1 
provides the framework for basic simulated annealing. 


Algorithm 7.1: Elementary simulated annealing. Countless variants of 
algorithms based on this framework have been proposed. Practical imple- 
mentations do not return the solution s of the last iteration but the best 
solution found throughout the search 

Input: Initial solution s; fitness function f to minimize; neighborhood structure M, 

parameters Tinit, Tend < Tij and O < a < 1 

Result: Modified solution s 
1T — Tinit 
2 while T > teng do 
3 Randomly generate m € M 
A = f(sem) — f(s) 
Randomly generate 0 < u < 1 
if A < 0 or e-4/T > u then m is accepted 

| ssm 


T—a.T 


uoo we 


oo 


This framework is generally modified. First, the parameters defining the initial 
and final temperatures provide a very different effect according to the fitness 
function measure unit. Indeed, if f measures the length of the TSP tour, these 
parameters should be adapted depending on whether the unit is meters or kilometers. 
To make the algorithm more robust, we do not invite the user to directly provide 
temperatures. For instance, the user specifies degrading moves acceptance rates 
Tini; €t Tend, Which is much more intuitive. The temperatures are then calculated 
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Fig. 7.2 Changes in TSP tour length of and temperature evolution during simulated annealing 


automatically according to these rates. A random walk performing a few hundred or 
a few thousand steps can record statistics on the average A values. 

Frequently, the temperature is not decreased at each iteration (Line 8). Another 
parameter is introduced, defining the number of iterations performed with a given 
temperature. 

Figure 7.2 illustrates the evolution of the tour length for a TSP with 225 cities. 
Code 7.1 was executed with an initial temperature of 5 - dmax /n, a final temperature 
of 20: dmax/ n? anda = 0.99, where dmax is the largest distance between two cities. 
The algorithm was provided a relatively good initial tour. There is a significant 
deterioration of the latter during the first iterations at high temperatures. This 
degradation is necessary to alter the structure of the starting solution to discover 
better solutions. About half of the iterations are carried out unnecessarily in this 
run, as the value of the best solution found no longer evolves. 

Code 7.1 implements a very basic simulated annealing for the TSP. It is based 
on the 2-opt neighborhood structure. Algorithm 7.1 is adapted to decrease the 
temperature only every n? iterations and not at each iteration. Thus, a value of o 
between 0.8 and 0.99 produces satisfactory results, regardless of the instance size. 
Finally, the user must provide the absolute value of the initial and final temperatures. 
As mentioned above, Tjnit = 5: dmax/n and Teng = 20- dag / n^ are values that 
can be suitable for starting a parameter tuning. 
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Code 7.1 tsp SA.py A basic simulated annealing implementation for the TSP 


import math 
from random_generators import unif, rando # Listing 12.1 


dHHHHHHHHE Basic Simulated Annealing for the TSP, based on 2-opt moves 
# Distance matrix 
tour, # TSP tour 
length, # Tour length 
initial_temperature, # SA parameters 
final_temperature, 
alpha): 


len(tour) 
best length - length 
best tour - tour 
initial temperature 
iteration = 0 
while T > final_temperature: 
3 unit (0; m-1} # First city of a move randomly chosen 


j = (i + unif (2, n-2))$n # Second city is unif successors further 
AE j-e Ts 
$4 3 9X # j must be further on the tour 


delta = d[tour[ill[tour[jl] + d[tour[i«1]] [tour[(j+1) % n]]\ 
-d[tour[i]] [tour[i + 1]] - d[tour[jll[tour[(j + 1) + n1] 
if delta < 0 or math.exp(-delta / T) > rando(): 
length = length + delta # Move accepted 
for k in range((j = i) // 2): # Reverse sub-path between i and j 
tour[k + i+ 1], tour[j = k] = tour[j - kl; tour[k +i+1 


# is there an improvement? 
if best length » length: 
best length - length 
best tour - tour 
print('SA {:d} {:d}’.format (iteration, length)) 
iteration += 1 
if iteration % 
T x- alpha # Decrease temperature 
return best tour, best length 


7.2 Threshold Accepting 


Threshold accepting, proposed by Dueck and Scheuer [4], is a pure local search. 
It only moves from a solution to one of its neighbors. Like simulated annealing, 
demon, and great deluge algorithms, the move are randomly chosen but are not 
systematically applied to the current solution. In the case of an improving move, 
the neighbor solution is accepted. If the move deteriorates the solution, it is only 
accepted if the deterioration is lower than a given threshold. The latter is gradually 
decreased to reach zero, so that the method stops in a local optimum. Algorithm 7.2 
provides the threshold accepting framework. 
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Algorithm 7.2: Threshold accepting. The values of the thresholds r1, ... tr 
are not necessarily explicitly provided but calculated, for instance, by 
providing only the initial threshold and multiplying it by another parameter, 
a, at each round of R iterations 

Input: Initial solution s; fitness function f to minimize; neighborhood structure M, 

parameters T, R, 7j.... Tr 

Result: Solution s* 
1 s*es 
2 for t from | to T do 


3 for R iterations do 
4 Randomly generate m € M 
5 if f(s m) — f(s) < % then the move m is accepted 
6 s— sim 
7 if f(s) « f(s*) then 
8 | ses 
Proportion of |A| < x 
1 
0.8 
0 
tr =0 T2 TI A max 
Fig. 7.3 Technique for determining the thresholds T1, ..., tr. The empirical distribution function 


F is obtained by performing a number of random moves and recording their amplitude in absolute 
value 


Gilli et al. [6] proposed an automated method for setting the thresholds. First, 
random moves are performed, recording the fitness difference of neighbor solutions. 
This allows determining the empirical distribution function F of the amplitude 
of the moves. Then, the T thresholds are fixed using the inverse of this function 
(see Fig. 7.3): t; = p UeBT- t)/ T). Put differently, the first threshold ri 
accepts about 80% of the degrading moves, while the last, tr = 0, only allows 
improvements. 
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7.3 Great Deluge Algorithm 


The great deluge algorithm, proposed by Dueck [3], has similarities with the 
previous one. However, the absolute value of the fitness function limits the search 
progression instead of the amplitude of the moves. The name of this method comes 
from the legend that, as a result of incessant rain, all terrestrial beings eventually 
drowned, except for those on Noah’s Ark. The animals on the ground panicked and 
ran in all directions, everywhere except where there was water. 

The analogy with a maximization process is made by considering random moves 
that are accepted as long as they do not lead to a solution whose quality is less than 
a threshold L. The last is the water level which increases by a value of P at each 
iteration. This parameter simulates the rain strength. The process stops when the 
value of the current solution is less than L. Algorithm 7.3 provides the framework 
of this method. Its operating principle is illustrated in Fig. 7.4. 


Algorithm 7.3: Great deluge algorithm. The algorithm can adapt to mini- 
mization problems. Hence, simulating the behavior of fish when the water 
level drops! 


Input: Solution s, fitness function f to maximize, neighborhood structure M, parameters 


Land P 

Result: s* 
1s*cs 
2 while f(s) > L do 
3 Generate a random move m € M 
4 if f(s pm) > L then 
5 $—— sm 
6 if f(s) « f(s*) then 
7 | s*—s 


8 L—L-P 


VA. at 
(d) 


Fig. 7.4 Illustration of the great deluge algorithm. State of the landscape at the beginning of the 
rain (a). The water level increases, (b), (c), until only the highest peaks emerge (d) 
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7.4 Demon Algorithm 


The demon algorithm simulates the behavior of a compulsive gambler who always 
bets larger amounts of money. The devil of playing pushes futilely to spend money 
if the earnings exceed a certain threshold, Dmax. Once this threshold is reached, 
the gambler continues to play. The gambler enters the casino with an sum of D in 
pocket and stops playing when exhausted, after Imax bets. This last parameter is 
involved in many iterative local searches for directly adjusting the computational 
effort. Translated in terms of local search, a bet is to allow a degrading move. But 
the loss cannot exceed the available budget. If the move improves the solution, 
the budget is increased accordingly, up to the maximum threshold. Algorithm 7.4 
provides the framework of this method. 


Algorithm 7.4: Demon algorithm. This algorithm is relatively simple to 
implement, but, as for threshold accepting, its parameters must be adjusted 
according to the numerical value of the data 


Input: Solution s, fitness function f to minimize, neighborhood structure M, parameters 
Imax ,D, Dmax 
Result: s* 
1 s“ sS 
2 for Imax iterations do 
3 Randomly generate a move m € M 
4 A = f(sem) — f(s) 
5 if A < D then 
6 D—D-A 
7 if D > Dmax then 
8 D Dmax 


9 s— spm 
10 if f(s*) > f(s) then 


11 sS*es 


7.5 Noising Methods 


The noising methods, proposed by Charon and Hudry [2], make the assumption 
that the data of the problem are not known with infinite precision. Under these 
conditions, even if the problem is convex and simple, an improvement method 
can be trapped in local optima resulting from artifacts. To make the improvement 
method more robust, a random noise is added, either to the data or to the move 
evaluation. For instance, the coordinates of a Euclidean TSP can be slightly 
changed. Taking these stochastic values into account, the resulting fitness function 
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has no local optima. The lasts are somehow erased by the random noise. The 
framework of noising methods is provided by Algorithm 7.5. 


Algorithm 7.5: Noising method. At each move evaluation, random noise is 
generated according to the probability distribution noise(i), whose variance 
generally decreases with i 


Input: Solution s, fitness function f to minimize, neighborhood structure M, parameters 
Lax, noise(i) 
Result: s* 
1s*os 
2 foralli € 1...7,4; do 


3 Randomly generate a move m € M 
4 if f(scbm) +noise(i) < f(s) then 
5 s— sem 

6 if f(s) « f(s*) then 

7 | s*as 


A parameter of the method is a probability distribution, set up with the iteration 
number. Each time a solution is evaluated, a random noise occurrence is generated. 
Generally, its expectation is zero, and its variance decreases with the iteration 
number (see Fig. 7.5). 

At the end of the algorithm, the method gets closer and closer to an improvement 
method. The variance of the noise function must naturally depend on the numerical 
data of the problem. To achieve this goal, one can incorporate the evaluation of 


Noise level distribution 


Iterations 


Fig. 7.5 Usual example of the evolution of the random noise distribution as a function of the 
number of iterations performed by a noising method. The color density represents the probability 
density 
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the objective function in the probability distribution. The choice of the latter can 
transform a noising method into simulated annealing, threshold accepting or other 


techniques described above. Code 7.2 implements a kind of noising method for the 
TSP. 


Code 7.2 tsp noising.py Implantation of a noising method for the TSP. The noise distribution is 
uniform and decreases exponentially with the number of iterations performed 


from build 20pt data structure import x # Listing 5.3 
from random_generators import unif, rando # Listing 12.1 
from tsp utilities import x 4 Listing 12.2 
from math import x 


iHHHHHHHHE Noising method for the TSP 
|| def tsp noising (d, # Distance matrix 
tour, # TSP tour 
length, # Tour length 
initial_noise, # Parameters 
final_noise, 
alpha): 


n len(tour) 
t build 2opt data structure (tour) 
current noise - initial noise 
best length - length 
iteration - 0 
while current noise » final noise: 
i = unif(0, m=) # First city of a move randomly chosen 
last i= i 
while t[t[i]]s>1 != last 3 and t[i]>s1 !- last i: 
nE i ean 
while j>s1 I= last i and (t[jl]s»L !- last íi or iss1 I= last 1): 
delta = d[i >> 1] [j >> 1] + altii ss Ale hil. ss X] X 
sad x IIC] es 2) = di SS 2 Begpjd. X] 
if delta + current noise * log(rando()) < 0: # SA criterion 
length = length + delta # Move accepted 
best i, past J = Clil, ct: 05) 
tli], t[j] =j ^ 1, i^ 1 # New successors and predecessors 
t[best i ^ 1], t[best j ^ 1] = best j, best i 


# Avoid reversing immediately a degrading move 


# is there an improvement? 
if best_length > length: 

best_length = length 
= tsp 2opt data structure to tour (t) 
print('Noising {:d} {:d}’.format (iteration, length)) 
iteration += 1 


tour 


if iteration $ (n « n) -- 
current noise *= alpha # Decrease noise 
tjl # Next j 
t[il 4 Next i 
return tour, best length 


To evaluate and perform a move in constant time, the data structure introduced in 
Section 5.1.3.3 has been used. However, this data structure does not allow evaluating 
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a random move (i, j) in constant time. Indeed, performing the tour in a given 
direction, given a city i and its succeeding one s;, we cannot immediately identify 
the city s; that succeeds j. 

The artifice used in this code is to systematically scan the whole neighborhood 
instead of randomly drawing the move. The noise added to the evaluation of a move 
is such that the acceptance criterion is identical to that of a simulated annealing. 


7.6 Late Acceptance Hill Climbing 


Another technique is similar to the methods of simulated annealing, threshold 
accepting, and great deluge. The core idea is to differ the acceptance criterion of 
a neighbor solution. Instead of comparing the value of the latter with that of the 
current solution, it is compared to that obtained h iterations before. Thus, a neighbor 
solution is accepted either if it is at least as good as the current solution, or if it is 
better than the solution visited h iterations previously. The derived method is called 
Late Acceptance Hill Climbing (LAHC). 

"Hill climbing" refers to an improvement method seeking to maximize an 
objective. Naturally, a descent method is obtained by changing the acceptance 
criterion of a neighbor solution. LAHC implementation requires storing a list L of 
the h values of the previous solution visited. This strategy allows a self-calibration 
of the acceptance criterion of a worse solution. Unlike the methods viewed above, 
LAHC is insensitive to the order of magnitude of the fitness values. 

The framework of LAHC, given by Algorithm 7.6, does not specify a stopping 
criterion. A possibility is to set the probability p of being in a local optimum relative 
to the neighborhood M. The stopping criterion corresponding to this probability is 
to have performed | M | (In | M| — In(1 — p)) iterations without improving the current 
solution. 


7.] Variable Neighborhood Search 


Variable neighborhood search (VNS) [7] implements an idea called strategic 
oscillations. The search alternates intensification and diversification phases. The 
chief idea of this method is to rely on several neighborhoods Mj... Mp. A first 
neighborhood, M1, is exploited as usual to find local optima. This is one of the most 
elementary ways to intensify the search. The other neighborhoods allow escaping 
from these optima by performing random moves. The latter ones increasingly 
destroy the local optima structure. Performing random moves is also one of the 
simplest ways to diversify the search for exploring other portions of the solution 
space. The framework of a basic VNS is provided by Algorithm 7.7. 

A very limited number of neighborhoods are generally used. In this case, 
the method is repeated several times (see Problem 7.4). Several variants of this 
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Algorithm 7.6: Late acceptance hill climbing 


Input: Solution s, fitness function f to minimize, neighborhood structure M, parameters 
h, stopping criterion 
Result: Improved solution s 
1 fork €0...h—1 do 


2 | Lke f(s) 

3i—0 

4 repeat 

5 Randomly select a move m € M 

6 if f(s@m) < Li or f(s pm) € f(s) then 
7 | ssm 

8 if f(s) « L; then 

9 L L= f(s) 
10 i — (i--1) mod 


1 until The stopping criterion is satisfied 


Algorithm 7.7: Variable neighborhood search. When a limited number p of 
neighborhoods are available, the algorithm is repeated several times 


Input: Solution s, fitness function f to minimize, neighborhood structures M1...Mp 
Result: s* 

sS*cs 

k—1 

while k < p do 

Randomly generate a move m € Mx 

s spm 

Find the local optimum s' associated with s in neighborhood M1 

if f(s’) « f(s*) then 


| sas! 


ve o 310 Uu & WNE 


ke1 
else 


Se s* 
kek+1 


eee 
Re S 


framework have been proposed: VNS descent, VNS decomposition, skewed VNS, 
and reduced VNS. 


Code 7.3 provides a very simple VNS implementation for the TSP. 
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Code 7.3 tsp VNS.py VNS implementation for the TSP. The neighborhood M; consists in 
swapping two cities k times. The repair method is an ejection chain. In addition to its extreme 
simplicity, this implementation requires no parameters 


from random generators import unif # Listing 12.1 
from tsp utilities import tsp length 4 Listing 12.2 
3| from tsp LK import tsp LK # Listing 12.3 


5| £HHHHHHHHE. Variable Neighborhood Search for the TSP 
def tsp VNS (d, # Distance matrix 
best tour, # TSP tour 
best length): 


len(best tour) 
iteration, k = 0, 1 
while k « n: 

tour = best tour[:] 


for _ in range(k): 4 Perturbate solution 
us unif(0, n = X) 
v = unif(0, ) 
tour[u], tour[v] = tour[v], tour[u] 
length - tsp length(d, tour) 
tour, length - tsp LK(d, tour, length) 
iteration += 1 
if length « best length: 
best tour - tour[:] 
best length - length 
print('VNsS [:d]Xt [:d]Xe fsd} 
.format (iteration, k, length)) 


else: 
k += 1 
return best tour, best length 


7.8 GRASP 


The greedy randomized adaptive search procedure (GRASP) was proposed by Feo 
and Resende [5]. It repeatedly improves, with a local search, a solution obtained 
with a greedy constructive method. The last incorporates a random component so 
that it produces various solutions. This method comprises two parameters, max, the 
number of repetitions of the outer loop of the algorithm and o, to adjust the degree 
of randomization. The framework of the method is provided by Algorithm 7.8. 
Code 7.4 implements the construction of a solution using GRASP. In practice, 
this code is repeated several times, and the best solutions produced are retained. 
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Algorithm 7.8: GRASP. The programmer must initially design a method, 
local search, for improving a solution. The user must provide two param- 
eters, Imax Which sets the computational effort, and œ which sets the random 
choice level. The difference with the greedy constructive Algorithm 4.2 are 
highlighted 


Input: Set E of elements constituting a solution; incremental cost function c(s, e); fitness 
function f to minimize, parameters /,,;, and 0 < a < 1, improvement method 
local search 

Result: Complete solution s* 

1 fio 
2 for Imax iterations do 


3 Initialize s to a trivial partial solution 
4 R—E // Elements that can be added to s 
5 while R Z Ø do 
6 Find Cin = mineeg c(s, e) and Cmax = maxeenm c(s, e) 
7 Choose randomly, uniformly e' € R such that 
Cmin € eus e) S Cmin + O(Cmax = Cmin) 
8 se sue // Include e' in the partial solution s 
9 Remove from R the elements that cannot be added any more to s 
10 s' — local_search(s) // Find the local optimum associated with s 
11 if f* > f(s’) then 
2 pref) 
13 ses 


Code7.4 tsp GRASP.py GRASP Implementation for the TSP. The randomized greedy construc- 
tion is based on the nearest neighbor criterion. The improvement procedure is the ejection chain 
Code 12.3 


from random generators import rand permutation # Listing 12.1 
from tsp utilities import tsp length # Listing 12.2 
3| from tsp LK import tsp LK 4 Listing 12.3 


5| £HHHHHHHHE. Procedure for producing a TSP tour using GRASP principles 
def tsp GRASP (d, # Distance matrix 
alpha): 


len(d[0]) 
tour - rand permutation (n) 
for i in range(n - 1): 
# determine c min and c max incremental costs 
c min, c max = float('intf*), float('-inf') 
for j in range(i +1, n): 
if c min ».ditour[il][tourTtj]l]: 
c min = d[tour[i]] [tour[j]] 
if c max < d[tour[i]] [tour[j]]: 
c max = d[tour[i]] [tour [j1] 


next = i+1 # Find the next city to insert, based on lower cost 
while d[tour[i]] [tour[next]] > c min + alpha * (c max - c min): 
next += 1 


tour[i + 1], tour[next] = tour[next], tour[i + 1] 


length = tsp_length(d, tour) 
tour, length = tsp LK(d, tour, length) 
return tour, length 
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Setting parameter o. = 0 leads to a purely greedy constructive method. Unless 
many elements have the same incremental cost, the repetition of constructions 
does not make sense. Setting @ = 1 leads to a purely random construction. The 
method represents then an iterative local search starting with random solutions. 
This technique is often used because it produces better solutions than a single local 
search run and requires negligible coding effort. To benefit from the advantages 
of the GRASP method, it is necessary to tune the œ parameter. Usually, it will 
produce its full potential for values close to 0. It should additionally be noted that 
the initialization of the partial solution may include a random component. For the 
TSP, it can be the departure city. The incremental cost function may correspond to 
the nearest neighbor criterion. 


Problems 


7.1 SA Duration 

How many iterations does a simulated annealing run if it starts with an initial 
temperature Tọ and ends at temperature Ty, knowing that the temperature is 
multiplied by « at each iteration? 


7.2 Tuning GRASP 

Try to tune the o parameter of the GRASP code. Take the TSPLIB problem instance 
tsp225. Are good values depending on the number of iterations Imax the method 
performs? 


7.3 VNS with a Single Neighborhood 

VNS requires to have p different neighborhoods and to use a particular neighbor- 
hood Mj to find local optima. If we only have Mj, how can we build the other 
neighborhoods? 


7.4 Record to Record 

In tsp VNS, n different neighborhoods are used, leading to a method without 
parameters. Could a more efficient algorithm be obtained by limiting the number of 
neighborhoods and repeating the search several times? How many neighborhoods 
and how many times should we repeat the search? 
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Chapter 8 A 
Construction Learning ins 


After having studied the four basic principles—modeling, decomposition, construc- 
tion, and improvement—this chapter introduces the fifth principle of metaheuristics: 
learning mechanisms. The algorithms seen in the previous chapter rely solely 
on chance to try to obtain better solutions than would be provided by greedy 
constructive methods or local searches. This is probably not very satisfactory from 
the intellectual point of view. Without solely relying upon chance, this chapter 
studies how to implement learning techniques to build new solutions. Learning 
processes require three ingredients: 


* Repeating experiences and analysing successes and failures: we only learn by 
making mistakes! 

* Memorizing what has been made. 

* Forgetting the details. This gives the ability to generalize when in a similar but 
different situation. 


8.1 Artificial Ants 


The artificial ant technique provides simple mechanisms to implement these 
learning ingredients in the context of constructing new solutions. 

The social behavior of some animals has always fascinated, especially when a 
population comes to realizations completely out of reach of an isolated individual. 
This is the case with bees, termites, or ants: although each individual follows an 
extremely simple behavior, a colony is able to build complex nests or efficiently 
supply its population with food. 
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8.1.1 Real Ant Behavior 


Following the work of Deneubourg et al. [2] who described the almost algorithmic 
behavior of ants, researchers had the idea of simulating this behavior to solve 
difficult problems. 

The typical behavior of an ant is illustrated in Fig. 8.1 with an experience made 
with a real colony that has been isolated. The latter can only look for food by 
going out from a single orifice. The last is connected to a tube separated into two 
branches joining further. The left branch is shorter than the one on the right. As ants 
initially have no information on this fact, the ants equally distribute in both branches 
(Fig. 8.1a). 

While exploring, each ant drops a chemical substance that it is apt to detect with 
its antennas, which will assist it when returning to the anthill. Such a chemical 
substance carrying information is called pheromones. On the way back, an ant 
deposits a quantity of pheromones depending on the quality of the food source. 
Naturally, an ant that has discovered a short path is able to return earlier than that 


+ Source of food LI 
ri H 
4 
h K 
* K 
V D 
+ $ 
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(a) Initial situation (b) Final situation 


Fig. 8.1 Behavior of an ant colony separated from a food source by a path that is divided. Initially, 
ants are evenly distributed in both branches (a). The ants having selected the shortest path arrive 
earlier at the food source. Therefore, they faster lay additional pheromones on the way back. The 
quantity of pheromones deposited on the shortest path grows faster. After a while, virtually all ants 
will use the shortest branch (b) 
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Therefore, the quantity of pheromones deposited on the shortest path grows 
faster. Consequently, a new arriving ant has information on the way to take and 
bias its choice in favour of the shortest branch. After a while, it is observed that 
virtually all ants use the shortest branch (Fig. 8.1b). Thus, the colony collectively 
determines an optimal path, while each individual sees no further than the tip of its 
antennas. 


8.1.2 Transcription of Ant Behavior to Optimization 


If an ant colony manages to optimize the length of a path, even in a dynamic context, 
we should be able to transcribe the behavior of each individual in a simple process 
for optimizing intractable problems. This transcript may be obtained as follows: 


* An ant represents a process performing a procedure that constructs a solution 
with a random component. Many of these processes may run in parallel. 

* Pheromone trails are te values associated with each element e constituting a 
solution. 

* Traces play the role of a collective memory. After constructing a solution, the 
values of the elements constituting the latter will be increased by a quantity 
depending on the solution quality. 

* The oblivion phenomenon is simulated by the evaporation of pheromone trails 
over time. 


Next is to clarify how these components can be put in place. The construction 
process can use a randomized construction technique, almost similar to the GRASP 
method. However, the random component must be biased not only by the incremen- 
tal cost function c(s, e), which represents the a priori interest of including element 
e in the partial solution, but also by the value te which is the a posteriori interest 
of this element. The last is solely known after having constructed a multitude of 
solutions. 

The marriage of these two forms of interest is achieved by selecting the next item 
e to include in the partial solution s with a probability proportional to c7 - c(s, ef, 
where o. > 0 and B. < 0 are two parameters balancing the respective importance 
accorded to memory and incremental cost. The update of artificial pheromones 
is performed in two steps, each requiring a parameter. First, the evaporation of 
pheromones is simulated by multiplying all the values by 1 — o, where 0 < 
p < l represents the evaporation rate. Then, each element e constituting a newly 
constructed solution has its Te value increased by a quantity 1/f (s), where f(s) is 
the solution cost, which is assumed to be minimized and greater than zero. 
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8.1.3 MAX-MIN Ant System 


The first artificial ant colony applications contained only the components described 
above. The trail update is a positive feedback process. There is a bifurcation point 
between a completely random process (learning-free) and an almost deterministic 
one, repeatedly constructing the same solution (too fast learning). Therefore, it is 
difficult to tune a progressive learning process with the three parameters «œ, B and p. 

To remedy this, Stützle and Hoos [5] suggested limiting the trails between two 
values Tmin and Tmax. Hence, selecting an element is bounded between a minimum 
and a maximum probability. This avoids elements possessing an extremely high 
trail value, implying that all solutions would contain these elements. This leads to 
the MAX-MIN ant system, which proved much more effective than many other 
previously proposed frameworks. It is given in Algorithm 8.1. 


Algorithm 8.1: MAX-MIN ant system framework 


Input: Set E of elements constituting a solution; incremental cost function c(s, e) > 0; 
fitness function f to minimize, parameters /,,,,. 71. €. D, Tnin, Tnax: and 
improvement method a(-) 

Result: Solution s* 

1 f*— œ 
2 for Ve c E do 


3 Te — Tmax 


for Inax iterations do 
fork=1...mdo 


4 
5 
6 Initialize s as a trivial, partial solution 
7 RE // Elements that can be added to s 
8 while R Z 2 do Build a new solution 
9 Randomly choose e € R with a probability proportional to T% - c(s,e)P // Ant 
colony formula 
10 s<sUe 
n From R, remove the elements that cannot be added any more to s 
12 sy — a(s) // Find the local optimum s; associated with s 
13 if f* > f (sx) then Update the best solution found 
14 fF efs) 
15 S* — Sk 
16 for Ve € E do Pheromone trail evaporation 
17 Te — (1— p): Te 
18 Sp —— best solution from {s1,..., Sm} 
19 for Ve € sp do Update trail, maintaining it between the bounds 
20 | te Max( Tmin, NiN( Tmax; Te + 1 f (55))) 


This framework comprises an improvement method. Indeed, implementations 
of “pure” artificial ants colonies, based solely on building solutions, have proven 
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inefficient and difficult to tune. There may be exceptions, especially for the 
treatment of highly dynamic problems where an optimal situation at a given time 
is no longer optimum at another one. 

Algorithm 8.1 has a theoretical advantage: it can be proved that if the number of 
iterations Imax — oo and if Tmin > O, then it finds a globally optimal solution with 
probability tending to one. The demonstration is based on the fact that Tmin > 0 
implies that the probability of building a globally optimal solution is not zero. In 
practice, however, this theoretical result is not tremendously useful. 


6.1.4 Fast Ant System 


One of the disadvantages of numerous frameworks based on artificial ants is their 
large number of parameters and the difficulty of tuning them. This is the reason 
why we have not presented Ant systems (AS [1]) or Ant Colony System (ACO [3]) 
in detail. In addition, it can be challenging to design an incremental cost function 
providing pertinent results. An example is the quadratic assignment problem. Since 
any pair of elements contributes to the fitness function, the ultimate element to 
include can contribute significantly to the quality of the solution. Conversely, the 
first item placed does not incur any cost. This is why a simplified framework called 
FANT (for Fast Ant System) has been proposed. 

In addition to the number of iterations, Imax, the user of this framework must only 
specify another parameter, tp. It corresponds to the reinforcement of the artificial 
pheromone trails. This reinforcement is systematically applied to the elements of 
the best solution found so far at each iteration. The reinforcement of the traces 
associated with the elements of the solution constructed at the current iteration, Te, 
is a self-adaptive parameter. Initially, this parameter is set to 1. When over-learning 
is detected (the best solution is again generated), tT, is incremented, and all trails are 
reset to Te. This implements the oblivion process and increases the diversity of the 
solutions generated. 

If the best solution has been improved, then Te is reset to 1 to give more weight 
to the elements constituting this improved solution. Ultimately, FANT incorporates 
a local search method. As mentioned above, it has indeed been noticed that the 
sole construction mechanism often produces bad quality solutions. Algorithm 8.2 
provides the FANT framework. 

Figure 8.2 illustrates the FANT behavior on a TSP instance with 225 cities. In 
this experiment, the value of t was fixed to 50. This figure provides the number 
of edges different from the best solution found so far, before and after calling the 
improvement procedure. 

A natural implementation of the trails for the TSP is to use a matrix t rather 
than a vector. Indeed, an element e of a solution is an edge [i, j], defined by its two 
incidents vertices. Therefore the value r;; is the a posteriori interest to have the edge 
[i, j] in a solution. The initialization of this trail matrix and its update may therefore 
be implemented with the procedures described by Code 8.2. 
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Algorithm 8.2: FANT framework. Most of the lines of code are about 
automatically adjusting the weight Te assigned to the newly built solution 
against the t, weight of the best solution achieved so far. If the latter is 
improved or if over-learning is detected, the trails are reset 

Input: Set E of elements constituting a solution; fitness function f to minimize, 


parameters Jnax, Tp and improvement method a(:) 
Result: Solution s* 


1 f*— 

2 Tel 

3 for Ve € E do 

4 Te — Te 

5 for Imax iterations do 

6 Initialize s to a partial, trivial solution 

7 R—E // Elements that can be added to s 
8 while R Z Ø do 

9 Randomly choose e € R with a probability proportionnal to Te 

10 s<—sUe 

n From R, remove the elements that cannot be added any more to s 

12 s' — a(s) // Find the local optimum s’ associated with s 
13 if s’ = s* then manage over-learning 

14 TH THI // More weight to the newly constructed solutions 
15 for Ve € E do Erase all trails 

16 L Te — Te 

17 if f* > f(s) then manage best solution improvement 

18 f — fis) 

19 s" — Sk // Update best solution 
20 v—l // Give minimum weight to the newly constructed solutions 
21 for Ve € E do Erase all trails 

2 | ket 
23 for Ve € s' do reinforce the trails associated with the current solution 

24 Te €— Te + Te 
25 for Ve € s* do reinforce the trails associated with the best solution 

26 Te —— Te + Th 


The core of an ant heuristic is the construction of a new solution exploiting 
artificial pheromones. Code 8.1 provides a procedure not exploiting the a priori 
interest (an incremental cost function) of the elements constituting a solution. In 
this implementation, the departure city is the first of a random permutation p. At 
iteration i, the i first cities are definitively chosen. At that time, the next city is 
selected with a probability proportional to the trail values of the remaining elements. 
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Fig. 8.2 FANT behaviour on a TSP instance with 225 cities. For each iteration, the diagram 
provides the number of edges different from the best solution found by the algorithm, before 
and after calling the ejection chain local search. Vertical lines indicate improvements in the best 
solution found. In this experiment, the last of these improvements corresponds to the optimal 
solution 


Code 8.1 generate_solution_trail.py Implantation of the generation of a permutation only 
exploiting the information contained in the pheromone trails 


from random_generators import unif # Listing 12.1 
from tsp utilities import tsp length # Listing 12.2 


iHHHHHHHHE Building a solution using artificial pheromone trails 
def generate_solution_trail(d, # Distance matrix 
tour, # Tour produced by the ant 
trail): # Pheromone trails 
n = len(tour) 
for i in range(1, n - 1): 
total = 0 
for j in range(i +1, n): 
total += trail[tour[i - 1]] [tour[j]] 
target = unif(0, total - 1) 
diu 
total = trail[tour[i - 1]] [tour[j]] 
while total « target: 
total += trail[tour[i - 1]][tour[j + 1] 
j tadl 
tour[jl; toür[i] s tour[i]; tour[j] 
return tour, tsp length(d, tour) 
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Once the three procedures given by the Codes 8.1 and 8.2 as well as an 
improvement procedure are available, the implementation of FANT is very simple. 
Such an implantation, using an ejection chain local search, is given by Code 8.3 


Code 8.2 init update trail.py Implementation of the trail matrix initialization and update for the 
FANT method applied to a permutation problem. If the solution just generated is the best previously 
found, trails are reset. Otherwise, the trails are reinforced both with the current solution and the 
best one 


iHHHHHHHE (Re-)initialize all trails 
2|def init trail(initial value, # Initial value for all trails 
trail): # Pheromone trails 


5 n = len(trail[0]) 
6 for i in range(n): 
for j in range(n): 
8 trail fiji = initial value 
9 for i in range(n): 
10 trail[i][i] = 0 
11 return trail 


13| 4HHHHHHHHE Updating trail values 


i4| def update_trail(tour, # Last solution generated by an ant 
15 global_best, # Global best solution 
16 exploration, # Reinforcement of last solution 
17 exploitation, # Reinforcement of global best solution 
18 trail): # Pheromone trails 


20 if tour == global best: 

21 exploration += 1 4 Give more weight to exploration 
22 trail - init trail(exploration, trail 

23 else: 

24 for i in ‘tour: 

25 n = len(trail [0] ) 

26 trail [tour[i]] [tour[(i + 1) $ n]] += exploration 

27 trail [global_best[i]] [global_best[(i + 1) % n]] += exploitation 

28 return trail, exploration 
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Code 8.3 tsp FANT.py FANT for the TSP. The improvement procedure is given by Code 12.3 


from random_generators import rand_permutation # Listing 12. 
from generate solution trail import x» # Listing 8. 
from init update trail import x # Listing 8. 
from tsp LK import tsp LK # Listing 12. 


1HHHHHHHHE Fast Ant System for the TSP 

|| def tsp FANT (d, # Distance matrix 
exploitation, # FANT Parameters: global reinforcement 
iterations): # number of solution to generate 


len(d[0] ) 
best cost = float('inf') 
exploration - 1 
trail = [[-1] æ m for _ in range(n)] 
trail - init trail(exploration, trail) 
tour - rand permutation(n) 
for i in range(iterations): 
build solution 
tour, cost - generate solution trail(d, tour, trail 
improve built solution witho a local search 
tour, cost - tsp LK(d, tour, cost) 
it cost « best cost: 
best cost - cost 
print('FANT {:d} {:d}’.format(i+1, costy) 
best sol - list(tour) 
exploration - 1 4 Reset exploration to lowest value 
trail - init trail(exploration, trail) 
else: 
# pheromone trace reinforcement - increase memory 
trail, exploration - update trail(tour, best sol, 
exploration, exploitation, trail) 


return best sol, best cost 


8.2 Vocabulary Building 


Vocabulary building is a more global learning method than artificial ant colonies. 
The idea is to memorize fragments of solutions, which are called words, and to 
construct new solutions from these fragments. Put differently, one has a dictionary 
used to build a sentence attempt in a randomized way. A repair/improvement 
procedure makes this solution attempt feasible and increases its quality. Finally, 
this new solution sentence is fragmented into new words that enrich the dictionary. 

This method has been proposed in [4] and is not yet widely used in practice, 
although it has proved efficient for a number of problems. For instance, the method 
can be naturally adapted to the vehicle routing problem. Indeed, it is relatively 
easy to construct solutions with tours similar to those of the most efficient solution 
known. This is illustrated in Fig. 8.3. 

By building numerous solutions using randomized methods, the first dictionary 
of solution fragments can be acquired. This is illustrated in Fig. 8.4. 
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(b) 


Fig. 8.3 (a) The optimal solution to a VRP instance. (b) A few tours quickly obtained with a taboo 
search. We notice great similarities between the latter and those of the optimal solution 


~~ d 


Fig. 8.4 Fragments of solutions (vehicle routing tours) constituting the dictionary. A partial 
solution is built by randomly selecting a few of these fragments (indicated in color) 
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Fig. 8.5 (a) A sentence attempt is constructed by randomly selecting a few words from dictionary 
(b). This attempt is completed and improved 


Once an initial dictionary has been constructed, solution attempts are built, for 
instance, by selecting a subset of tours that do not contain common customers. 
This solution is not necessarily feasible. Indeed, during the construction process, 
the dictionary might not include tours only containing customers not yet covered. 
Therefore, it is necessary to repair this solution attempt, for instance, by means of 
a method similar to that used to produce the first dictionary but starting with the 
solution attempt. This phase of the method is illustrated in Fig. 8.5. The improved 
solution is likely to contain tours that are not yet in the dictionary. These are included 
to enrich it for subsequent iterations. 

The technique can be adapted to other problems, like the TSP. In this case, the 
dictionary words can be edges appearing in a tour. Figure 8.6 shows all the edges 
present in more than two-thirds of 100 tours obtained by applying a local search 
starting with a random solution. The optimal solution to this problem is known. 
Hence, it is possible to highlight the few edges frequently obtained that are not 
part of the optimal solution. Interestingly, nearly 8096 of the edges of the optimal 
solution have been identified by initializing the dictionary with a basic improvement 
method. 
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Fig. 8.6 An optimal solution (light color) and fragments of tours constituting an initial dictionary 
for the TSP instance pr2392. The fragments are obtained by repeating 100 local searches starting 
with random solutions and only retaining the edges appearing in more than 2/3 of the local optima. 
Interestingly, almost all these edges belong to an optimal solution. The few edges that are not part 
of it are highlighted (darkest color) 


Problems 


8.1 Artificial Ants for Steiner Tree 
For the Steiner tree problem, how to define the trails of an artificial ant colony? 
Describe how these trails are exploited. 


8.2 Tuning the FANT Parameter 

Determine good values for the parameter tp of the tsp FANT method provided 
by Code 8.3 when the latter performs 300 iterations. Consider the TSPLIB instance 
tsp225. 


8.3 Vocabulary Building for Graph Coloring 
Describe how vocabulary construction can be adapted to the problem of coloring 
the vertices of a graph. 
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Chapter 9 A 
Local Search Learning gsti 


Local searches play an essential role in metaheuristics. Virtually, all efficient heuris- 
tic methods incorporate a local search. Moreover, metaheuristics are sometimes 
defined as a master process guiding a local search. In Chapter 5, we have already 
seen some basic neighborhood adaptation techniques, in particular its limitation 
by the list of candidate moves, granular search and its extension by filter-and-fan 
search, and ejection chains. 

Most randomized methods reviewed in Chapter 7 are dedicated to local search 
extensions. They are not implementing a learning process. They only memorize the 
best solution found so far or statistics for self-calibrating the parameters. This allows 
taking the unit of measurement of the fitness function into account. The following 
step in the sophistication of metaheuristics is to learn to locally modify solutions to 
a problem. Among the popular techniques, taboo search (also written tabu search) 
offers many strategies and various local search learning mechanisms. This chapter 
reviews the basic mechanisms. Other strategies are proposed in the book of Glover 
and Laguna [5] and take a more natural place in other chapters of the present book. 


9.1 Taboo Search 


Proposed by Fred Glover in 1986, the key idea of taboo search is to explore the 
solution space with a local search beyond local optima [2—4]. This implies designing 
a mechanism to prevent the cycling phenomenon, the fact of entering a cycle where 
a limited subset of solutions is repeatedly visited. The simplest concept to imagine 
is to memorize all the solutions which have been successively encountered during 
a local search but preventing the latter from choosing a neighbor solution that has 
already been visited. The visited solutions thus become taboo. 

This concept is simple but cumbersome to implement: imagine that a local search 
can require millions of iterations, which means memorizing the same number of 
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solutions. Each neighbor solution must be checked to ensure that it has not already 
been explored. Knowing that a neighborhood can contain thousands of solutions, we 
quickly realize the hopelessness of this way of doing things, either because of the 
memory space needed to store the visited solutions or because of the computational 
effort to compare neighbor solutions. 


9.1.1 Hash Table Memory 


A simple technique to implement an approximation of this principle of prohibiting 
previously visited solutions is to use a hash table. An integer value A (s) is associated 
with each solution s of the problem. If we visit the solution s; at iteration i of the 
search, we store the value i in the entry h(s;) mod m of an array T of m integers. 
Thus, the value # of the kth entry of the table T indicates at which iteration a 
solution whose hash value k (modulo m) has been visited. 

The / function is generally not bijective over the set of solutions to the problem, 
so various solutions can have the same hash value. Indeed, the size m of the array 
T must be limited due to the available memory. This technique is, therefore, an 
approximation of the concept of prohibiting solutions already visited. Indeed, not 
only the latter is prohibited but also all those that have the same hash value. 
Moreover, since the value of m is limited, we cannot forever forbid returning to 
a solution of a given hash value. After m iterations at the latest, all the solutions 
would be prohibited. 

It is therefore necessary to implement a key feature of the learning process: 
oblivion. These considerations lead us to introduce the key parameter of a taboo 
search: the taboo duration, sometimes referred to as the taboo list length. 


9.1.1.1 Hash Functions 


The choice of a hash function to implement a taboo search is not very difficult. 
In some cases, the value of the fitness function is perfect, especially when the 
neighborhood includes many moves at zero cost (plateaus). Indeed, taboo search 
chooses the best move allowed at each iteration. Hence, neutral changes make 
learning difficult. Being on a plateau, the choice of one or the other neighbor is 
problematic and cycling can occur. In case the fitness function admits an extensive 
range of values, prohibiting during a number of iterations to return to a given fitness 
value allows, in many cases, to break the local optimum structure and discover 
another one. 

A general hash function is as follows. Let us use the notation introduced in 
Chapter 4 devoted to constructive methods. A solution is composed of elements 
e € E. Each of them is associated with an integer value ze. These values are 
randomly generated at the beginning of the algorithm. The hash value of a solution s 


is provided by A(s) = 5 ses ze. A more sophisticated hash technique using multiple 
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tables is discussed in [6]. It makes it possible to obtain the equivalent of a very large 
table, while limiting the memory space. 


9.1.2 Taboo Moves 


Prohibition based on a hash function is uncommon in taboo search implementations. 
Frequently, one prohibits some moves or solutions with certain features. To be 
concrete, consider the example of the symmetric TSP. 

A 2-opt move can be characterized by the pair [i, j] which consists in replacing 
the edges [i, s;] and [j, sj] of the current solution s by the edges [i, j] and [s;, sj]. 
One assumes here that the solution is provided by the "successor" of each city and 
that the city j comes "after" the city i when traveling in the order given by s. If the 
move [i, j] is carried out at an iteration, one can prohibit the reverse move [i, s;] 
during the following iterations. This is a direct prohibition based on the opposite of 
a move. 

After performing the move [i, j], another possibility is to indirectly prohibit the 
moves leading to a solution containing both edges [i, s;] and [, sj]. 

By abuse of language, let m^! denote the inverse of a move, or a feature 
of a solution that is forbidden after performing the move m of a neighborhood 
characterized by a set M of moves. Although (s & m) & m=! = s, there may 
be various ways to define m^. Since the size of the neighborhood is limited, 
it is necessary to relax the taboo status of a move after relatively few iterations. 
Therefore, the taboo list is frequently presented as a short-term memory. The most 
basic taboo search framework is given by Algorithm 9.1. 


9.1.2.1 Implementation of Taboo Status 


If the neighborhood size is not too large, it can be stored, for each move, the 
iteration from which it can be used again. Let us immediately illustrate such an 
implementation for the following knapsack instance with nine variables. 


maxr = 125; + 1052 + 953 + 754 + 455 + 856 + 1157 + 6sg + 1359 
Subject 105; + 1252 + 853 + 754 + 555 + 1356 + 957 + 65g + 1459 < 45 
to: s; € {0,1} @=1,...,9) 

(9.1) 
A solution s of this problem is a 0—1 vector, with s; = 1 if the object i is chosen 
and s; = 0 otherwise. Each object occupies a certain volume in the knapsack and 
the latter possesses a global volume of 45. An elementary neighborhood for this 

problem is to alter the value of a unique variable of s. 
The taboo conditions can be stored as a vector ¢ of integers with t; giving the 
iteration number at which the variable s; can revert to a previous value. Initially, 
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Algorithm 9.1: Elementary taboo search framework 


Input: Solution s, set M of moves, fitness function f(-) to minimize, parameters Jnax, d. 
Result: Improved solution s* 

1 s*es 

2 for Imax iterations do 

3 best neighbor value —— ce 

4 forall m € M (such that m (or s m) is not marked as taboo) do 

5 if f(scp m) < best. neighbor value then 

6 best neighbor value — f(s®m) 

7 | m*-—m 


if best. neighbor.-value < œ then 


9 Mark (m*)~! (or s) as taboo for the next d iterations 
10 s — s m“ 

11 if f(s) < f(s*) then 

2 | $—s 

13 else 

14 Error message: d too large: no move allowed! 


t = 0: at the first iteration, all variables can be modified. For this small instance, let 
us assume a taboo duration of d = 3. The initial solution can be set to s = 0, which 
represents the worst feasible solution to the problem. Table 9.1 gives the evolution 
of a taboo search for this small instance. 

Unsurprisingly, object 9 is put in the knapsack at the first iteration. Indeed, this 
object has the largest value. At the end of iteration 1, it is forbidden to set sọ = 0 
again up to the iteration tọ = 4 = 1 + 3. As long as there is room in the knapsack, 
taboo search behaves like a greedy constructive algorithm. At iteration 4, it reaches 
the first local optimum s = (1, 1, 0, 0, 0, 0, 1, 0, 1) of value r = 46. 


Table 9.1 Evolution of an elementary taboo search for ten iterations for the knapsack instance 9.1. 
This search forbids changing again a given variable for d = 3 iterations 


Iteration | Variable | Modified Fitness | Volume | Taboo status 
number | modified | solution value used 


1 (0. 0, 0, 0, 0, 0, 0, 0, 4) 

2 (5. 0, 0, 0, 0,0, 0,0, 4) 

3 5. 0. 0, 0. 0, 0, 6,0, 4) 

4 5. 7. 0, 0. 0. 0, 6,0, 4) 

5 (5, 7,0, 0, 0, 0, 6, 0, 8) 

6 (5,7, 9, 0, 0, 0, 6, 0, 8) 

7 (5,7, 9, 0, 0, 0, 6, 10, 8) 

8 (5, 11, 9, 0, 0, 0, 6, 10, 8) 

9 (5, 11,9, 12, 0, 0, 6, 10, 8) 
10 (5, 11,9, 12, 13, 0, 6, 10, 8) 
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At iteration 5, an object is removed, because the knapsack is dead full. Only 
object 9 can be removed due to taboo conditions. As a result, the fitness function 
decreases from r = 46 to r = 33, but space is freed up in the knapsack. At 
iteration 6, the best move would be to add object 9, but this move is taboo. It would 
correspond to return to the solution visited at iteration 4. 

The best authorized move is therefore to add object 3. Then, at the subse- 
quent iteration the object 8 is added, leading to a new local optimum s = 
(1, 1, 1,0,0,0, 1, 1, 0) of value r = 48. The knapsack is again completely full. 
At iteration 8, it is necessary to remove an object, setting s2 = 0. The place 
thus released makes it possible to add the objects 4 and 5, discovering a solution 
s = (1,0, 1, 1, 1, 0, 1, 1, 0) even better than both local optima previously found. 

For the TSP, the type of restrictions described above may be implemented using 
a matrix T whose entry 1j; provides the iteration from which we can again perform 
a move where the edge [i, j] belongs to the tour. This principle extends to any 
combinatorial problem for which we search for an optimal permutation. 


9.1.2.2 Taboo Duration 


In the previous example, the taboo duration was set to three iterations. This value 
may seem arbitrary. If the taboo conditions are removed (duration set to zero), 
the search enters a cycle. Once a local optimum is reached, an object is removed 
and added again in the next iteration. The maximum taboo duration is clearly 
limited by the neighborhood size: indeed, the search performs all the moves of the 
neighborhood and then remains blocked. At that time, they are all prohibited. 

These two extreme cases lead to inefficient searches—a zero taboo duration 
is equivalent to learning nothing; a very high duration implies poor learning. 
Consequently, we have to achieve a sensible compromise for the taboo duration. 
Therefore, this duration must be learned for the problem instance treated. Figure 9.1 
illustrates this phenomenon for Euclidean TSP instances of size n — 100 randomly, 
uniformly distributed in a square. The taboo search performs Ing, = 1000 
iterations. 

Battiti and Tecchiolli [1] proposed a learning mechanism called reactive taboo 
search. All the solutions visited by the search are memorized. They can be stored 
in an approximate way, employing the hash technique presented in Section 9.1.1.1. 
The search starts with a restricted taboo duration. If the search visits a solution 
again, then the duration is increased. If the search does not revisit any of the 
solutions during a relatively significant number of iterations, then the taboo duration 
is diminished. 

This last condition seems strange. Why should we force the search to return to 
previously explored solutions? The explanation is as follows: if the taboo duration is 
sufficient to avoid the cycling phenomenon, it also means we are forbidden to visit 
some good solutions because of the taboo status. We are therefore likely to ignore 
high-quality solutions. 
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Fig. 9.1 Influence of the taboo duration for Euclidean TSP with 100 cities. A short duration allows 
visiting better quality solutions, on average. But the search cannot escape from local optima. 
Hence, the quality of the best solutions found is not excellent. Conversely, if the taboo duration 
is too high, the average solution quality decreases, as well as that of the best solutions discovered. 
In this case, a reasonable compromise seems to be a taboo duration around the instance size. More 
generally, the square root of the neighborhood size seems appropriate 


It is therefore necessary to find a taboo duration long enough to avoid cycling but 
as short as possible so as not to prohibit good moves. This is precisely the purpose 
of reactive taboo search. 

However, this learning technique only repels the problem. Indeed, the user must 
determine another parameter which is the number of iterations without revisiting 
a solution, triggering the taboo duration decrease. In addition, it requires the 
implementation of a storage mechanism for all visited solutions, which can be 
cumbersome to implement. 

Another technique for choosing low taboo durations while strongly preventing 
the cycling phenomenon is to randomly set it at each iteration. A classic method 
is to select the taboo duration at random between a minimum duration dmin and a 
maximum value dmax = Amin + A. 

To create Fig. 9.2, 500 QAP instances of size n = 12 with known optimal 
solution have been generated. For each instance, we have performed a taboo search 
with a considerable number of iterations (for instances that small) with all possible 
parameters (dmin, A). The number of optimal solutions found for each couple was 
then counted. If the search succeeds in finding all the 500 optimal solutions, the 
average number of iterations needed to reach the optimum is recorded. 

With a deterministic taboo duration (A = 0), it was never possible to achieve 
all the optimal solutions, even with a relatively large duration. Conversely, with 
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Fig. 9.2 Taboo duration randomly generated between dmin and dyin + A. An empty circle 
indicates that taboo search has been unable to systematically find the optimum of 500 QAP 
instances. The circle size is proportional to the number of optimum found (the larger, the better). A 
filled disc indicates that the optimum has been systematically found. The disk size is proportional 
to the average number of iterations required for obtaining the optimum (the smaller, the better) 


low minimum durations and a random variation equals to the size of the problem, 
the optimum is systematically obtained. Moreover, the optimum is reached with 
relatively few iterations. 

Figure 9.3 reproduces a similar experiment for the TSP. It provides the solution 
quality obtained for a small instance for any couple (dmin, A). We observe similari- 
ties with Fig. 9.2. A random taboo duration proportional to half the instance size is 
a reasonable compromise. 


9.1.2.3 Aspiration Criterion 


The unconditional prohibition of a move can cause unwanted situations. For 
instance, one can skip an improvement of the best solution found. Thus, Line 4 
of Algorithm 9.1 is modified, and if the move m allows achieving a solution better 
than s*, it is retained. In the literature, this is referred to as an aspiration criterion. 
Other less trivial aspiration criteria can be imagined, in particular to implement a 
long-term memory. 
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Fig. 9.3 Quality of the 
solutions obtained with a 
taboo search where the taboo 
duration is randomly chosen 
between dmin and dmin + A 
for a classical instance with 


n = 127 cities. The method Quality 
performs 10n iterations = +1.8% 
starting from a deterministic = 
greedy nearest neighbor tour. = +1.2% 
For all values of dmin = 
between 0 and 1.5” and A ==+0.6% 
between 0 and 2n, we = : 
launched a search and == Optimum 
represented the solution 
quality by a color (76 above 
optimum) 

min 


9.2 Strategic Oscillations 


Forbidding the inverse of moves recently performed implements a short-term 
memory. This mechanism can be very efficient for instances of moderate size. By 
cons, if we address more complex problems, this sole mechanism is not sufficient. A 
search strategy that has been proposed in the context of taboo search is to alternate 
intensification and diversification phases. 

The goal of intensification is to thoroughly examine a limited portion of the 
search space, maintaining solutions that possess a globally similar structure. Once 
all the attractive solutions of this portion are supposedly discovered, the search has 
to go elsewhere. Put differently, the search is diversified by altering the structure of 
the solution. The search intensification can be implemented with a short duration 
taboo list. 


9.2.1] Long-Term Memory 


Implementing a diversification mechanism supposes to include a long-term memory. 
Several techniques have been proposed to achieve that. 


9.2 Strategic Oscillations 193 


9.2.1.1 Forced Moves 


The most certain and convenient way to break the structure of a solution is to 
perform moves that have never been selected during many iterations. With a 
basic taboo search memorizing the iteration from which each move can again be 
performed, the implementation of this form of long-term memory is virtually free. 
Indeed, if the iteration number stored for a move is considerably smaller than the 
current iteration, then this move has not been selected for a long time. 

It is thus possible to force the use of this modification, regardless of the quality 
of the solution to which it leads. This mechanism requires a new parameter, K, 
representing the number of iterations from which a never chosen move is forced. 
Naturally, this parameter must be larger than the size of the neighborhood; otherwise 
the search degenerates, performing only forced moves. If several moves are to be 
forced at a given iteration, one is chosen arbitrarily. The others will be forced in 
subsequent iterations. This type of long-term memory represents a kind of aspiration 
criterion, introduced in the previous section. 


9.2.1.2 Penalized Moves 


A weakness of taboo search with very short-term memory is that it only makes small 
changes. To illustrate this on the TSP, such a search will “knit” a small knot on a tour 
that was locally optimal, then another elsewhere and so on until the taboo condition 
drops. At that point, the search unknits the first knot. This situation is illustrated in 
Fig. 9.4. 


Fig. 9.4 A basic taboo 
search with a short-term 
memory can enter cycling 
with this 2-optimal tour. 
Indeed, this solution belongs 
to a plateau. There are eight 
moves not changing the tour 
length (dotted lines). With a 
taboo duration shorter than 
eight, the search repeatedly 
chooses one of these moves 
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To avoid this behavior, an idea is to store the number of times fm a move m was 
chosen and limit its use. During the move evaluation, a penalty F - fm is added. 
The proportionality factor F is a new parameter of the technique that must be tuned. 
Naturally, an aspiration criterion must be used in conjunction with this mechanism. 
Indeed, the search should nevertheless be allowed choosing a heavily penalized 
move leading to an improvement of the best solution known. 

Code 9.1 provides a taboo search implementation for the TSP, based on the 2- 
opt neighborhood. Two types of memories are employed: a short-term conventional 
memory that prevents moves from reintroducing both edges that have been recently 
removed and a long-term memory that counts the number of times each edge has 
been inserted in the solution. 

A move is penalized proportionally to the number of times the concerned 
edges have been introduced in the solution. A move is forbidden if both edges 
have recently been removed from the solution (eventually at different iterations). 
Ultimately, a move is aspired if it improves on the best solution achieved so far. 
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Code 9.1 tsp TS.py Taboo search implementation for the TSP 


from random generators import x 


tour, 

length, 
iterations, 

min tabu duration, 


# Listing 12.1 


# Distance matrix 

# Intital tour provided 

# Length of initial tour 

# Number of tabu search iterations 
# Minimal tabu duration 


max_tabu_duration, 

F): # Factor for penalizing moves repeatedly performed 

n = len(tour) 

tabu [[0] + n for in range ( 

count [[0] « n for in range 

best tour tour [:] 

best length length 

for iteration in range(0, iterations): 
delta penalty - float('inf') 
depre -1 


# Tabu list 
] # Move count 


n)] 
(n) 


# Cities retained for performing a move 


# Find best move allowed or aspired 
for i in range(n - 2): 
j i+ 2 
while j « n and (i > 0 or j « n - 1): 
delta d[tour[i]] [tour[jl] + d[tour [i+1]] [tour[ (j+1) 
-d [tour [i]] [tour[i + 11] - d[tour[jl]l[tour[(j + 1) 
penality F « (count [tour[i]] [tour[j] 
+ count [tour[i + 1]] [tour[(j + 1) $ 
# Conditions for accepting a candidate move 
better delta + penality < delta_penalty 
allowed tabu[tour[i]] [tour[j]] <= iteration V 
or tabu[tour[i + 1]] [tour[(j + 1) $ 
length + delta < best_length 


= ES 
© 


\ 


n 
n 


11 
% n]] 


n]1) 


<= iteration 


n]] 


aspirated 


if better and (allowed or aspirated): 
delta penalty = delta + penality 
Ir, Ete. 
jr hed # Next neighbor 
# Perform retained move 
if delta penalty « float('inf'): 
tabu [tour[ir]] [tour[ir + 1] tabu[tour[jr]] [tour[(jr + 1) $% 
tabu[tour[ir + 1]] [tour[ir]] = tabu[tour[(jr+1) $ 
= unif(min_tabu_duration, max_tabu_duration) 


n]] 
n]] [tour[jr]] 


* iteration 


N 
N 


count [tour [ir]] [tour[ir + 1 
count [tour [jr]] [tour[(jr + 1 
count [tour [ir + 1]] [tour [ir 
count [tour[(jr + 1) % 


x 
© 


tour [ir] 
tour [ir] 


\ 


n 
n 


length += d 
-d 


[tour[jr]] + d[tour[ir+1]] [tour[(jr+1) 
[tour [ir+1]] - d[tour[jr]] [tour[(jr+1) 


11 
11 


E 
© 


Hil 2y: 
tour[jr - k] 


for k in range((jr - ir) 
tour[k + ir + 1], 


tour[j]r - kl; tour[k + ir --.1] 
else: 
print('All moves are forbidden tabu list too long') 
if best length » length: # is there an improvement? 
best length length 
best tour - tour[:] 
print('TS (:d) [:d)'.format(iteration«1, best length)) 
return best tour, best length 
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Fig. 9.5 Same diagram as 
Fig. 9.3 but with a taboo 
search managing a long-term 
memory. Frequently 
performed moves are 


penalized. The value of the Quality 
penalty is F - ne, where n, is — 
the number of times the edge = +1.8% 
e has been included in or = 
removed from the tour and = +1.2% 
the value of F is the average — 
length of an edge divided by = +0.6% 
the instance size = : 
== Optimum 
min 


Figure 9.5 illustrates the quality of this taboo search performing 10n iterations on 
a TSP instance with n — 127 cities. The search implements the penalty mechanism 
based on the frequency of moves. Compared to a taboo search not employing this 
mechanism (Fig. 9.3), the taboo duration can be reduced and the search achieves 
good solutions more frequently. This mechanism could even be operated alone, 
without a taboo list. Indeed, an excellent solution is obtained with a minimum and 
maximum taboo duration of 0. A taboo list can be implemented by means of a matrix 
whose entry (i, j) gives the iteration number from which one can again use the edge 
[i, j] in a move. Counting the frequency of moves is implemented in a similar way. 


9.2.1.3 Restarts 


A frequently used technique to intensify a taboo search is to restart with the best 
solution achieved so far. This is done if the search seems to stagnate, for instance, 
if there has been no improvement in the best solution during a relatively significant 
number of iterations. When restarting, the information collected during the previous 
iterations by the taboo list is kept, as well as other statistics, if any. Hence, the work 
achieved during these iterations is exploited. 

Thus, the data structures guiding the search being in an altered state after 
restarting, the trajectory followed by the search, will also be. This mechanism 
can be identified as the opposite of the one presented above where we force the 
use of neglected attributes for many iterations. Its purpose is to achieve search 
intensification, not diversification. Naturally, the implementation of this mechanism 
implies the introduction of new parameters that must be adjusted, like the number 
of iterations to be carried out before a restart and a possible adaptation of the value 


References 197 


of other parameters (taboo duration, frequency penalty) to guide the search toward 
diversified trajectories. 


Problems 


9.1 Taboo Search for an Explicit Function 

An integer function of integer variables f(x, y) is explicitly given in Table 5.1. We 
seek the minimum of this function by applying a taboo search. The neighborhood 
consists in modifying by a unit the value of one variable. The taboo conditions 
consist in forbidding to increment (respectively: to decrement) a variable that has 
been decremented (respectively: incremented). First, consider a taboo duration of 
d — 3and (—7, —6) as the starting solution. Next, start from (—7, 7) and use d — 1. 
The search stops if there is no more move allowed or if 25 iterations have been 
performed. 


9.2 Taboo Search for the VRP 

For the VRP, the neighborhood consists in either moving a customer from one route 
to another, or swapping two customers from different routes. Suggest taboo criteria 
for this neighborhood. 


9.3 Taboo Search for the QAP 
Consider the QAP instance given by the flow F and distance D matrices: 


05241 01123 
50302 10212 
F-23000 D—|12012 
40005 21101 
12050 32210 
Starting with the solution p — (1,2,3,4,5), perform six iterations of a taboo 


search. The moves are defined by pairs (i, j) that swap the elements p; and pj. 
If the move (i, j) is performed, then, it is forbidden for d = 5 iterations to place the 
element p; in position i and, simultaneously, the element p; in position j. For each 
iteration, provide the solution, its value, that of all the moves and their taboo status. 
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Chapter 10 A 
Population Management Chente; 


By abuse of language, all the methods previously presented can be classified as 
single-solution metaheuristics. Although most of these methods are building or 
modifying a lot of different solutions, they only consider one current solution at 
an iteration and, eventually, the best solution found so far. This classification could 
be disputed, especially for the ant system. Indeed, several solutions are built at a 
given iteration. However, an ant constructs a solution without taking care of the 
work done in parallel by the other ants, and all the solutions built in one iteration 
are forgotten once the trails are updated. Similarly, there are taboo searches storing 
several solutions, but they are used for determining the taboo status of a current 
solution neighbor. This chapter considers methods where several solutions are 
explicitly stored and iteratively used for generating or modifying other ones. 

With a proper modeling of an optimization problem, it is very easy to construct 
many different solutions, especially by means of a randomized method. Therefore, 
one can try to learn how to create new solutions from those previously constructed. 
This chapter studies how to exploit a population of solutions and how to combine 
the various basic metaheuristic components studied above. 

Let us illustrate this by the tour merging technique for the TSP. Figure 10.1 
shows five tours obtained with a randomized method in O(n logn) presented in 
Section 6.3.2. None of these solutions looks really nice. However, superimposing 
these solutions on the optimal solution reveals that all the edges of the latter are part 
of these tours. Therefore, we believe that intelligent exploitation of various solutions 
can help to discover better ones. 


10.1 Evolutionary Algorithms Framework 


The intuition at the source of evolutionary algorithms comes from biologist works 
of the nineteenth century, like Darwin and Mendel who founded the theory of the 
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Fig. 10.1 Optimal tour of the TSP instance tsp225 on which is superimposed five tours obtained 
by the fast method presented in Section 6.3.2 


evolution of living species. Indeed, over the course of generations, living beings 
are able to adapt to constantly changing external conditions. They can optimize 
their survival probability, thus resolving extremely complex problems. Therefore, 
why not attempt to artificially reproduce this evolution to solve hard combinatorial 
optimization problems? 

In the 1960s and 1970s, various ways of exploiting these ideas emerged. The 
general framework of the evolutionary algorithms is provided by Algorithm 10.1. 
One begins by generating a set of jz solutions to the problem, usually in a purely 
random fashion. This set of solutions is called a population by analogy with a 
group of living beings. In the same way, a solution to the problem is an individual. 
Evolutionary algorithms repeat the next loop (called a generational loop) until a 
stopping criterion is met. This is either set in advance, for example, the number of 
times the generational loop is repeated, or decided on the basis of the diversity of 
individuals present in the population. 

First, a number of solutions from the population are selected to be used for 
breeding. This is achieved by a selection operator for reproduction. The purpose of 
this operator is to favour the individuals that are well adapted to their environment 
(those with the best fitness function) at the expense of those that are weaker, sick, 
and ill-adapted similar to what happens in nature. 

The selected individuals are then mixed together (e.g., in pairs) using a crossover 
operator to form A new solutions called offspring which undergo random modifica- 
tions by means of a mutation operator. 'These two operators simulate the sexual 
reproduction of living species, assuming that, with a little luck, the favorable 
characteristics (the desirable genes contained in the DNA) of the parent solutions 
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Algorithm 10.1: Framework of evolutionary algorithms 


Input: Parameters u and A, selection for reproduction, crossover, mutation and selection 
for survival operators 

Result: Population of solutions P 

1 Generate a population P of u solutions 

2 repeat 

3 Select individuals from P with the selection for reproduction operator 

4 Combine the selected individuals with the crossover operator and apply the mutation 
operator to get A new solutions 

5 Among the u + A solutions, select u individuals with the selection for survival 
operator; these u individuals constitute the population P for the next generation 

6 until a stopping criterion is satisfied 


Population 
Selection for survival Selection for reproduction 
Fitness evaluation 
tH | | Offspring Parents 
Mutation Crossover 


Fig. 10.2 Generational loop in an evolutionary algorithm. From a population of solutions, symbol- 
ized here by colored sticks, one selects individuals who reproduce by crossover and mutation. The 
offspring thus generated is evaluated and incorporated into the population. Ultimately, individuals 
are eliminated by a selection operator for survival to bring the population back to its initial size 


will be transmitted to their children and that fortuitous mutations will result in the 
appearance new favorable genes. 

Finally, the new solutions are evaluated, and a selection operator for survival 
eliminates A solutions from the u + A available to reduce to a new population of u 
individuals. Figure 10.2 illustrates the process of a generational loop. 

The framework of evolutionary algorithms leaves considerable freedom in the 
choices to be made for the implementation of the various operators and parameters. 
For instance, the "evolution strategy" of Rechengerg [9] does not use a crossover 
operator between two solutions. In this technique, the solutions of the population 
are modified with a mutation operator and compete with each other, much like 
parthenogenetic reproduction. 
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Among evolutionary algorithms, it is undoubtedly the genetic algorithms (GA) 
proposed by Holland [3] that have received the most attention. This is paradoxical, 
since the purpose of his study was to understand the convergence mechanisms of 
these algorithms, not their ability to optimize difficult problems. For a long time, 
the community in this field continued to work on the genetic algorithm convergence 
theory, studying “orthodox” versions of the various operators mentioned above, in 
conjunction with a standard representation of solutions under the form of Boolean 
vectors with a specified size. 

Unfortunately, not all optimization problems have solutions that can be naturally 
represented by binary vectors. Using only standard operators and knowing their 
theoretical properties, considerable efforts have been made to discover appropriate 
encodings of solutions in the form of binary vectors and to decode them into feasible 
solutions. 

For the problems whose solutions are naturally represented by a permutation, the 
random key coding technique allows exploiting the standard crossover and mutation 
operators. A permutation of the elements of 1...7 are represented by an array t 
of n real numbers. The permutation p allowing the sorting of t corresponds to the 
solution coded by the array (see Fig. 10.12). 

The next sections review the main genetic algorithm operators, discussing how 
they can be generalized so that they equally apply to a natural representation of 
solutions and not only to binary vectors. 


10.2.1 Selection for Reproduction 


The selection for reproduction aims to favor the most efficient solutions so that 
they can transmit their beneficial properties to their offspring. Each solution i must 
therefore be assigned a fitness measure fj; the higher the quality, the higher the 
selection probability must be. If the objective of the problem to be solved is to 
maximize a function admitting positive values, this function can be directly used as 
fitness function. Otherwise, a transformation of the objective function is required to 
assign a fitness to each individual. 


10.2.1.1 Rank-Based Selection 


A traditional transformation is to sort the individuals. This does not require the 

computation of an objective function but only the possibility to compare the solution 

quality. The fittest individual in a population has a rank of | and the worst of u. 
The individual i of rank r; has a quality measure f; = (1-4 = )?, where p > Ois 


i Me : f 
a parameter to modulate the selection pressure. A pressure p = 0 implies a uniform 
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draw among the population (no selective pressure), while p — 2 represents a fairly 
high pressure. Code 10.1 provides an implementation of this operator for a selection 
pressure of p — 1. 


Code 10.1 rank based selection.py Implementation of a rank-based selection operator for 


reproduction, with selective pressure p = 1. The best of jz individuals has a probability of c 


to be selected, while the worst has a probability of ZUG 


import math 
2| from random generators import unif # Listing 12.1 


1HHHHHHHHE Selection operator for reproduction based on the rank 


s| def rank based selection (size): 
return int(size \ 
- math.ceil(math.sqrt(.25 + 2xunif(1, sizex(size + 1)/2)) - .5) 


10.2.1.2 Proportional Selection 


The simplest selection operator is to randomly draw an individual proportionally 
to its fitness. The individual i has thus a probability f;/}_ f; of being selected. In 
principle, we do not select just one individual at each generational loop but several. 
The selection is ordinarily performed with replacement, so that a (good) individual 
can be selected several times in one generation. 

Genetic algorithms are inherently parallel: the generational loop can be applied 
both to the production of a unique individual in each generation, as shown in 
Fig. 10.2, and to the generation of a multitude of offspring. A frequently used 
technique is to select an even number à of parent solutions in a generation and 
pair them up, and each pair produces two offspring per crossover. 


10.2.1.3 Natural Selection 


It is also possible to perform a purely random and uniform selection for repro- 
duction, just like what happens to many living species. The convergence of the 
algorithm must then be guided by the selection operator for survival, which ensures 
a bias toward the fittest individuals. Table 10.1 compares the selection probabilities 
of the operators presented above for a small population. 


10.2.1.4 Complete Selection 


If one does not choose too large a population size, it is also possible to involve 
all individuals in a systematic way for reproduction. As with natural selection, the 
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Table 10.1 Selection probability for different operators for reproduction. The objective function 
is to be maximized and is directly used as a fitness function for the proportional selection. The 
sum of the values of the objective function is 1000 


Objective Probability 


function Rank Proportional 
220 01 [0220 
162 o1 [0.162 
157 01 — 0157 
93 o1 [oos 
85 01 [0.085 
a fe oos foon for [0074 
61 01 |0.061 
55 8 —(|003 0.054 | 0.085 
49 9 jooo Jooss | 0.089 
m o1 [oo 


evolution of the population toward good solutions then depends on the selection 
operator for survival, which should favor the best solutions. 


10.2.2 Crossover Operator 


A crossover operator aims to simulate the sexual reproduction of living species. 
Schematically, the process of meiosis in sexual reproduction separates the DNA of 
each parent into two genetic sequences. This produces gametes (egg cell, sperm, 
or pollen grains). During the fertilization of the egg cell, genetic shuffling occurs, 
during which the sequence of genes of the offspring is produced by sequentially 
adding the genes of either parent in an arbitrary fashion. 

The purpose of this operator is to produce a new offspring, different from its 
parents, but having inherited some of their features. With a little luck, the offspring 
receives good features from its parents and is better adapted to its environment. With 
a little less luck, the offspring does not receive those good features. Nevertheless, it 
perpetuates valuable genes and provides a source of diversity within the population, 
which means potential for innovation. 

Figure 10.3 metaphorically illustrates this with the mating of different ladybird 
beetles. The couples at the top are likely to produce children very similar to 
themselves, while the couples at the bottom of the figure might produce genetically 
richer children. 

There are evolutionary strategies where the crossover operator is absent. These 
strategies mimic asexual reproduction, where an individual produces an offspring 
practically identical to itself, where only spontaneous mutations cause the popula- 
tion gene pool to evolve. 
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Fig. 10.3 Ladybird beetles mating. One can imagine the top couples will produce children very 
similar to themselves, while the lower ones will keep some genetic diversity in the population and, 
hopefully, produce some children better adapted to their environment 
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Fig. 10.4 Uniform crossover. Production of two complementary offspring from the genes of two 
parents. Each item of the first offspring is chosen at random from either parent by flipping a coin. 
The second offspring receives the complementary item 


10.2.2.1 Uniform Crossover 


Uniform crossover involves taking two parent solutions, represented as vectors of n 
items and creating a third one by choosing its items from either parent with equal 
probability. Figure 10.4 illustrates the production of two “anti-twin” offspring from 
two parents. This crossover operator is appropriate if it is straightforward and logical 
to represent any solution of the problem by a vector of n components and if any 
vector of that size can match a feasible solution. 

This is not the case for a problem where a permutation of n items is sought. One 
technique for adapting the uniform crossover for this situation is to proceed in two 
phases: in the first phase, the items of the permutation are randomly selected from 
either parent, provided that the item has not yet been chosen. If both parents possess 
items already selected at the position to be filled, the latter remains temporarily 
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Fig. 10.5 Uniform crossover on a permutation. An offspring is produced in two phases. We first 
successively choose the items of one or the other of the parents, as long as they are not part of the 
offspring. Otherwise, either we leave the position empty if both items are already in the offspring 
or we select the unique item available. The second phase randomly completes the offspring using 
the remaining items 


empty in the offspring. The second phase consists in filling in at random the vacant 
positions with the items that were not selected during the first phase. This operator 
is illustrated in Fig. 10.5. 


10.2.2.2  Single-Point Crossover 


The single-point crossover first randomly picks a point within the solution vector. 
Then it copies all the items of the first parent up to that point. Finally, it copies 
the items of the second parent from there. In practice, for a vector of n items, we 
randomly draw a number c between | and n — 1; we copy the items 1 to c from 
the first parent and the items c + 1 to n from the second parent. We can produce a 
second complementary offspring in parallel. Figure 10.6 illustrates this operator. 


10.2.2.3 Two-Point Crossover 


The two-point crossover consists in randomly selecting two different points. The 
offspring is created by copying the items before the first point and after the second 
point from one parent and copying the portion between the points from the other 
parent. This operator is illustrated in Fig. 10.7. The strategy can be generalized by 
choosing k crossover points. 
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Parent 1 
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Fig. 10.6 Single-point crossover. Production of two “anti-twins” by randomly drawing a crossover 
point (here, the 8). The items of the first parent are copied up to the crossover point and those of 
the second from there on 


Parent 1 
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Fig. 10.7 Two-point crossover. Production of two "anti-twins" by randomly drawing two 
crossover points (here, the 4 and 8). The items of the first parent are copied up to the first and 
from the second crossover point. The intermediate items come from the second parent 


10.2.2.4 OX Crossover 


For each problem, we can invent a specific crossover operator. For instance, for the 
TSP, one can advance the argument that portions of the paths should be copied from 
the parents into the offspring. If a solution is a permutation of the cities, we realize 
that the uniform crossover seen previously (adapted to the case of permutations) 
does not really make sense: the starting city is not decisive. The cities that precede 
and succeed a given city are important, not the absolute position of the city in the 
tour. The two-point crossover operator can be adapted for the problems where the 
sequences are significant. 

The OX crossover operator devised for the TSP begins by copying the intermedi- 
ate portion of one parent, like the two-point crossover. The last city of this portion is 
located in the other parent and the offspring is completed by cyclically scanning the 
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Fig. 10.8 OX crossover, specifically developed for the TSP. We start by randomly drawing two 
crossover points. The intermediate portion of the first parent is copied to the offspring. In this 
example, this portion ends in city 11. We locate this city in the second parent and complete the 
offspring from there, in this case with city 4. The cities already appearing in the offspring (1, 2, 
and 6) are skipped. When we arrive at the last city of the second parent (7), we return to the first 
(3) one 


cities of this parent and inserting those not yet included. The OX crossover operator 
is illustrated in Fig. 10.8. An implementation of this operator is given in Code 10.2. 


Code 10.2 OX_crossover.py Implementation of the OX crossover operator, preserving a sub-path 


from random_generators import unif # Listing 12.1 


HHHHHHHHH Crossover operator preserving successive values in a permutation 
def OX crossover (parentl, parent2): 4 Parent solutions 


n = len(parent1) 
# Randomly generate the portion of parentl that is copied in child 
pointe; pozinti = unif(l, m= 2); unif(i, n = 3) 
LE pointi >= point2: 
temp - point2 
point2 = pointl +1 
pointi - temp 


4 Copy the portion of parenti at the beginning of child 
child = [-1] * n 
inserted = [0] « n # Flag for elements already inserted in child 
for i in range(point2 - pointi + 1): 
child[i] = parenti[i + pointi] 
inserted [child[i]l] = 1 


# Last element of parent2 inserted in child 
parent2.index(child[point2 - point1]) 


# Insert remaining elements in child, in order of appearance in parent2 
nr inserted = point? - pointl + 1 
while nr inserted « n: 
if not inserted[parent2[i $ n]]: 
child[nr inserted] 


2 


parent2 [i % n] 


inserted [parent2 [i $ n]] 


nr_inserted += 1 
i += 1 
return child 
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10.2.3 Mutation Operator 


The mutation operators can be described in a simple way in the context of this 
book: it consists in randomly applying one or more local moves to the solution, as 
described in Chap. 5 devoted to local searches. 

The mutation operator has two roles: firstly, the local modification can improve 
the solution, and, secondly, even if the solution is not improved, it slows down the 
global convergence of the algorithm by strengthening the genetic diversity of the 
population. Indeed, without this operator, the population can only lose diversity. For 
instance, the crossover operators presented above systematically copy the identical 
parts of the parents in the offspring. Thus, some genes take over compared to 
others that disappear with the elimination of solutions by the selection operator for 
survival. 

Figure 10.9 illustrates the influence of the mutation rate for a problem where 
a permutation of n elements is sought. In this figure, a mutation rate of 5% 
means that there is such a proportion of elements that are randomly swapped in 
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Fig. 10.9 Influence of the mutation rate on the quality of the solutions produced as a function of 
the number of generational loops performed. Only the value of the best solution in the population is 
indicated. The algorithm favors the best solutions in the population by means of selection operators 
for reproduction and survival. Without mutation, the population converges relatively rapidly to 
individuals that are all similar and of poor quality. The higher the mutation rate, the slower the 
convergence, resulting in better solutions 
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the permutation. Code 10.3 gives an implementation of a mutation operator for 
problems on permutations. 


Code 10.3 mutate.py Implementing a mutation operator for problems on permutations 


i| from random generators import unif # Listing 12.1 


3| #HHHHHHHH Random mutation of a permutation 
|| def mutate (mutation rate, 
p) # Permutation to mutate 


n = len(p) 

8 mutations = int(mutation rate * n / 2.0) 
9 for _ in range (mutations): 

10 i = unif(0, n - 1) 

11 j s unif(0, m = I) 

12 plil, pL = pIM. ell 

13 return p 


10.2.4 Selection for Survival 


The last key operator in genetic algorithms is selection for survival, which aims to 
bring the population back to its initial size of u individuals, after A new solutions 
have been generated. Several selection policies have been devised, depending on the 
values chosen for the parameters jz and A. 


10.2.4.1 Generational Replacement 


The simplest policy for selecting the individuals who will survive is to generate the 
same number of offspring as there are individuals in the population (A = u). The 
population at the beginning of the new generational loop is made up only of the 
offspring, the initial population disappearing. With such a choice, it is necessary to 
have a selection operator for reproduction that favors the best solutions. This means 
the best individuals are able to participate in the creation of several offspring, while 
some of the worst are excluded from the reproduction process. 


10.2.4.2 Evolutionary Strategy 


The evolutionary strategy (u, A) consists in generating numerous offspring (A > u) 
and in only keeping the jz best offspring for the next generation. The population 
is therefore completely changed from one iteration of the generational loop to the 
next. This strategy leads to a bias in the choice of the fittest individuals from one 


10.2. Genetic Algorithms 211 


generation to the next. So, it is compatible with a uniform selection operator for 
reproduction. 


10.2.4.3 Stationary Replacement 


Another commonly used technique is to gradually evolve the population, with the 
generation of few offspring at each generational loop. A strategy is to generate à = 2 
children in each generation, which will replace their parents. 


10.2.4.4 Elitist Replacement 


Another more aggressive strategy is to consider all the u + à solutions available at 
the end of the generational loop and to keep only the best u for the next generation. 
This strategy was adopted to produce Fig. 10.10 illustrating the evolution of the 
fittest solution of the populations for various values of ju. 
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Fig. 10.10 Influence of the population size on the solution quality. When the population is too 
limited, it converges very rapidly with a low probability of discovering good solutions. Conversely, 
a large population converges very gradually, but better solutions are obtained 
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Code 10.4 implements an elitist replacement when A = 1, which means that only 
one offspring is produced at each generation. It replaces the worst solution in the 
population (if it is not even worse). In this code, we have included a basic population 
management that must contain exclusively different individuals. To simplify the test 
of equality between two solutions, they are discriminated only on the basis of their 
fitness: two solutions of the same length are considered identical. 


Code 10.4 insert child.py Implementation of elitist replacement where each generation produces 
only one child. This procedure implements basic population management where all individuals 
must have different fitness 


JHHHHHHHHE Inserting a child in a population of solutions 

def insert child (child, # Individual to insert in population 
child fitness, # Cost of child (the smaller, the better) 
population size, 
population, 
fitness, # Fitness of each individual 
order): # order[i] : individual number with rank i 


rank = [-1 for _ in range(population size)] i Rank of individuals 
for i in range(population size): 
rank [order[i]] = i 


child_rank = 0 # Find the rank of the child 
for i in range(population size): 
if fitness[i] « child fitness: 
child rank += 1 


if child rank « population size - 1: # The child is not dead-born 
if fitness[order[child rank]] != child fitness \ 
and (child rank -- 
or fitness[order[child rank - 1]] !- child fitness): 
population [order [population size - 11] = child[:] 
fitness [order [population_size - 1]] = child fitness 


for i in range(population size): 
if rank[i] »- child rank: 
rank[i] += 1 
rank [order [population_size - 1]] = child_rank 


for i in range(population size): 
order [rank [i]] 
else: 
child_rank = population_size 
return child_rank, population, fitness, order 


10.3 Memetic Algorithms 


Genetic algorithms have two major drawbacks: first, nothing ensures that the best 
solution found cannot be improved by a simple local modification, as seen in 
Chap. 5. Second, the diversity of the population declines with each iteration of the 
generational loop, eventually consisting only of clones of the same individual. 
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To overcome these two drawbacks, Moscato [8] designed what he called memetic 
algorithms. The first of these shortcomings is solved by applying a local search after 
producing an offspring. The simplest way to avoid duplication of individuals in the 
population is to eliminate them immediately, as implemented in Code 10.4. 

Code 10.5 illustrates a straightforward implementation of a memetic algorithm 
for the TSP where the offspring are improved using a local search based on 
ejection chains and only replace the worst solution in the population if they are 
of better quality than the latter and their evaluation is different from all those in the 
population, thus ensuring that no duplicates are created. This algorithm implements 
only an elementary version of a memetic algorithm. 


Code 10.5 tsp GA.py Implementation of a memetic algorithm for the TSP. This algorithm uses 
a selection operator for reproduction based on rank. After its generation, the offspring is improved 
by a local search (ejection chain method) and immediately replaces the worst solution in the 
population. This algorithm has three parameters: the number jz of solutions in the population, 
the number of generational loops to be performed, and the mutation rate 


from random generators import rand permutation Listing 
from tsp utilities import tsp length Listing 
from rank based selection import » Listing 
from OX crossover import OX crossover Listing 
from mutate import mutate Listing 
from insert child import insert child Listing 
from tsp LK import tsp LK Listing 


1HHHHHHHHE Basic Memetic Algorithm for the TSP 

def tsp GA(d, # Distance matrix (must be symmetrical) 
population size, 4 Size of the population 
generations, 4 Number of generations 
mutation rate): 


len(d[0]) 
population - [rand permutation (n) for in range(population size)] 
lengths - [tsp length(d, population[i]) for i in range(population size)] 


order - [i for i in range(population size)] 
for i in range(population size - 1): 
for j in range(i + 1, population size): 
if lengths [order[i]] > lengths [order [j]]: 
order [i], order[j] = order[j], order [i] 
print (‘GA initial best individual {:d}’.format (lengths [order [0]])) 


for gen in range (generations): 

parentl = rank based selection (population size) 

parent2 - rank based selection (population size) 

child = OX crossover (population [order [parent1]], 
population [order [parent2]]) 

child = mutate (mutation rate, child) 

child length = tsp length (d, child) 

child, child length - tsp LK(d, child, child length) 

child rank, population, lengths, order = insert child (child, 


child length, population size, population, lengths, order) 
if child rank -- 
print (‘GA improved tour (:d) {:d}’.format (gen, child length)) 
return population [order [0]], lengths [order [0] 
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Sórensen and Seveaux [11] proposed a more advanced population management. 
These authors suggest evaluating, for each solution produced, a similarity measure 
with the solutions contained in the population. Solutions that are too similar are 
discarded to maintain sufficient diversity so that the algorithm does not converge 
prematurely. 


10.4 Scatter Search 


Scatter search is almost as old as genetic algorithms. Glover [1] proposed this 
technique in the context of integer linear programming. At the time, it broke certain 
taboos, such as being able to represent a solution in a natural form and not coded by 
a binary vector or to mix more than two solutions between them, as metaphorically 
illustrated in Fig. 10.11. 

The chief ideas of scatter search comprise the following characteristics, presented 
in contrast to traditional genetic algorithms: 


Dispersed initial population Rather than randomly generating a large initial pop- 
ulation, the last is generated deterministically and as scattered as possible in the 
space of potential solutions. They are not necessarily feasible but are rendered so 
by a repair/improvement operator. 

Natural representation of solutions Solutions are represented in a natural way and 
not necessarily with binary vectors of a given size. 

Combination of several solutions More than two solutions may contribute to the 
production of a new potential solution. Rather than relying on a large population 
and a selection operator for reproduction, scatter search tries all possible 
combinations of individuals in the population, which must therefore be limited 
to a few dozen solutions. 

Repair/improvement operator Because of the natural representation of solutions, 
the simultaneous combination of several individuals does not necessarily produce 
a feasible solution. A repair operator projecting a potential infeasible solution 


Fig. 10.11 Scatter research 
breaks the taboo of breeding 
limited to two solutions 
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into the space of feasible solutions is therefore expected. This operator can also 
improve a feasible solution, especially by means of a local search. 

Population management A reference population, of small size, is decomposed 
into a subset of elite solutions (the best ones) and other solutions as different 
as possible from the elites. The goal is to increase the diversity of the population 
while keeping the best solutions. 


The framework of scatter search is given by Algorithm 10.2. The advantage of 
this framework is its limited number of parameters: u for the size of the reference 
population and E < u for the set of elite solutions. Moreover, the value of u must 
be limited to about twenty, since it is necessary to combine a number of potential 
solutions increasing exponentially with jz; this also means that the number E of elite 
solutions should be from a few units to about ten. 


Algorithm 10.2: Scatter search framework 


Input: Size u of the complete population, E size of the subset of elite solutions 
Result: Population of solutions 
1 Systematically generate a (large) population P of potential solutions as dispersed as 


possible 

2 repeat 

3 Repair and improve the solutions from P to make them feasible using the 
repair/improvement operator 

4 Eliminate identical solutions from P 

5 Identify the E best solutions from the population; they are retained in the reference set 
as elites 

6 Identify from P the u — E solutions which are the most different from the elite 
solutions, they are kept and complete the reference set 

7 Combine in all possible ways the u solutions of the reference set to obtain 2" — u — 1 
new potential solutions 

8 Join the potential solutions to the reference set to obtain the new population P of the 
next iteration 


9 until the population remains stable 


10.4.1 Illustration of Scatter Search for the Knapsack Problem 


To illustrate how the various options in the scatter search framework can be adapted 
to a particular problem, let us consider a knapsack instance: 


maxr = lls; + 10s2 + 9s3 + 12s4 + 10s5 + 656+ 757+ 5sg+ 359+ 8510 
Subject 33s, + 27s2 + 16s3 + 14s4 + 29s5 + 30s6 + 3157 + 33sg + 14s9 + 18519 < 100 
to: s; € (0, 1); = 1,..., 10) 
(10.1) 
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10.4.1.1 Initial Population 


The solutions to this problem are, therefore, ten-component binary vectors. To 
generate a set of potential solutions as scattered as possible, one can choose to 
put either all the objects in the knapsack, or one out of two, or one out of three, 
etc. For each potential solution thus generated, the complementary solution can also 
be added to the population. Naturally, not all the solutions from the population are 
feasible. To be specific, the solution with all objects does not satisfy the knapsack 
volume constraint; its complementary solution, with an objective value of zero, is 
the worst possible. 

A repair/improvement operator must therefore be applied to these potential 
solutions. This can be performed as follows: as long as the solution is not feasible, 
remove the object with the worst value/volume ratio. A feasible solution can be 
improved greedily, by including the object with the best value/volume ratio as long 
as the capacity of the knapsack permits it. This produces the population of solutions 
given in Table 10.2. 


10.4.1.2 Creation of the Reference Set 


Solutions 9 and 10 are identical to the first solution and are therefore eliminated. If 
we choose a set of E = 3 elite solutions, these are solutions 1, 2, and 8. Assuming 
that one wishes a reference set of u = 5 solutions, two solutions must be added to 
the three elites, among solutions 3 to 7. The two solutions to complete the reference 
set are determined by evaluating a measure of dissimilarity with the elites. An 
approach is to consider the solutions maximizing the smallest Hamming distance 
to one of the elites which is illustrated in Table 10.3. 


Table 10.2 Initial scattered population P for the knapsack instance 10.1 and the result of applying 
the repair/improvement operator on the potential solutions. Those which are not feasible are in 
bold, as well as the E = 3 elite solutions 


Potential Value Repaired/improved Value 
EN EM | 


(0900000000 |0  OL100000) |33 | 
= solution 
= solution 1 


3179 917909 
(109.100.100. 
[1000100010 2 | 
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Table 10.3 Determining the solutions from the population that are as different as possible from 
the elites. If we want a reference set of u = 5 solutions, we retain solutions 3 and 7 in addition to 
the three elites because they are those maximizing the smallest distance to one of the elites 


Candidate [Hamming distance [Minimal 
solution Elite 8 distance 
4 
2 
2 
i 
[o10101090»0 [sis ids P 


10.4.1.3 Combining solutions 


Finally, we need to implement an operator that allows us to create a potential 
solution by combining several of them from the reference set. Let us suppose we 
want to combine solutions 3, 7, and 8, of values 38, 36, and 44, respectively. 
One possibility is to consider the solutions as numerical vectors and make a linear 
combination of them. It is tempting to assign a weight according to the solution’s 
fitness. One idea is to give a weight of srera to solution 3, of —— to 
solution 7, and of —— to solution 8. The vector thus obtained is rounded to 
project it to binary values: 


0.322-(1, 0, 0, 1,0, 0, 1,0, 0, 1)+ 
0.305-(0, 1, 0, 1,0, 1, 0, 0, 0, 1)+ 
0.373-(0, 1, 1, 1, 1, 0, 0, 0, 1, 0) 
=(0.322, 0.678, 0.373, 1.000, 0.373, 0.305, 0.322, 0.000, 0.373, 0.627) 
Rounded :(0, 1,0, 1, 0, 0, 0, 0, 0, 1) 


10.5 Bias Random Key Genetic Algorithm 


Biased random key genetic algorithms (BRKGA) also provide population manage- 
ment with a subset of E elite solutions that are copied to the next generation. The 
main ingredients of this technique are: 


* An array of real numbers (keys) encodes a solution. If a natural representation 
of a solution is a permutation, then the permutation is one that sorts the keys in 
increasing order. 

* The E best solutions from the population are kept for the next generation. 

* The selection operator for reproduction always chooses a solution among the E 
best. 
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E elite solutions 


/ 


0.12 | 0.45 | 0.67 | 0.01 | 0.17 | 0.98 | 0.33 


0.54 | 0.78 | 0.18 | 0.92 | 0.83 | 0.21 | 0.42 


Biased crossover 


A offsprings 1 2 3 4 5 6 T 
0.12 | 0.78 | 0.67 | 0.92 | 0.17 | 0.98 | 0.33 


Decoding 


1 5 7 3 2 4 6 
Permutation 


Mutants 


Fig. 10.12 BRKGA: elite solutions are copied from one generation to the next one; a parent 
always comes from the elite; the crossover operator is biased and chooses more elements from 
the best parent; the offspring is decoded by sorting the keys in increasing order; the order provides 
the permutation associated with a solution 


e An offspring is generated with a uniform crossover operator, but the components 
of the best parent-solution are chosen with probability > 1/2. 

e At each generation, A < u — E children are generated. These offspring replace 
non-elite solutions for the next iteration. 

* The genetic diversity of the population is ensured by the introduction of w— E — X 
new randomly drawn arrays (mutants); this replaces the mutation operator. 


Figure 10.12 illustrates how this method operates to generate a new solution. 


10.6 Path Relinking 


Path relinking (PR) was proposed by Glover [2] in the context of taboo search. The 
idea is to memorize a number of good solutions found by a taboo search. We select 
two of these solutions, which have been linked by a path with the taboo search. 
We link these two solutions again by a new, shorter path, going from neighboring 
solution to neighboring solution. 
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Fig. 10.13 Path relinking. A starting solution—here, the permutation of seven elements 
(1, 2, 3, 4, 5, 6, 7)—is progressively transformed into a target solution (4, 6, 3, 2, 5, 1, 7) with a 
neighborhood structure. At each step, the neighbor solutions that are closer to the target solution 
are evaluated, and the one with the best fitness is chosen 


This technique can be implemented independently of a taboo search since all 
that is needed to implement it is a population of solutions and a neighborhood 
structure. A starting solution and an ending (target) solution are chosen from the 
population. We evaluate all the neighbors of the starting solution that are closer to 
the target solution than the starting one. Among these neighbors, the one with the 
best evaluation is identified, and the process is repeated from there until we arrive at 
the target solution. With a bit of luck, one of the intermediate solutions improves the 
best solution discovered. The path relinking technique is illustrated in Fig. 10.13. 

There are different versions of path relinking: the path can be traversed in both 
directions by reversing the role of the starting and target solutions; an improvement 
method can be applied to each intermediate solution; ultimately, the starting and 
target solutions can be alternately modified, and the process stops when meeting in 
an intermediate solution. Code 10.6 provides an implementation of path relinking 
for the TSP. It is based on 3-opt moves. 
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Code 10.6 tsp_path_relinking.py Path relinking implementation for the TSP. At each iteration, 
we identify a 3-opt move that incorporates at least one arc from the target solution to the current 
solution 


i| from tsp utilities import tsp succ to pred 4 Listing 12.2 


3| HHHHHHHHH Path relinking for the TSP, based on 3-opt neighborhood 

4| def tsp path relinking (d, 4 Distance matrix 

5 target, # Target solution (successors) 

6 length, 4 Length of current solution 
succ): # Starting solution 


9 best succ = succ[:] 

10 best length - length 

11 pred = tsp succ to pred (succ) 

12 best delta - -1 

13 while best delta « float('inf'): 

14 beést-delta - float (“int”) 

15 i = best i = best j = best k = pred[0] 

16 while best delta >= 0 and i !- 0: 

17 i = succ[íil 

18 it suce [4] = target li]: 

19 j pred [target [il] 

20 k = target [i] 

21 while k != i: 

22 if succ[k] != target [k]: 

23 delta = dli] [succ [j]] + d[jl[succ[k]] + d[k] [succ[i]] \ 
24 -d[i] [succ[i]] - d[j] [succ(j]] - d[k] [succ[k]] 
25 if delta < best_delta: 

26 delta = best_delta 

27 best i; best j; best k = i, j, Kk 

28 k = succ[k] 

29 if best delta = £loat('inf').: 

30 ie hy E= best 1, best^j. best k 

31 length += best_delta; 

32 pred [succ [i]]; pred[succ[j]], pred[succ[kl] =k, i, j; 

3 succ[j], succ[kl, succ[i] = suce [k]; succ[i], target [i]; 
34 if length < best_length: 

35 best_length = length 

36 best suce = succ[:] 


37 return best_succ, best_length 


10.6.1 GRASP with Path Relinking 


A method using the core components of metaheuristics (construction, local search, 
and management of a population of solutions) while remaining relatively simple and 
with few parameters is the GRASP-PR method (greedy adaptive search procedure 
with path relinking) by Laguna and Marti [6]. The idea is to generate a population P 
of different solutions by means of a GRASP with a parameter o (see Algorithm 7.8). 
These solutions are improved by means of a local search. 
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Then, we repeat Imax times a loop where we build a new solution, greedily and 
with a bias. This solution is also improved with a local search. We then randomly 
draw another solution of P and apply a path relinking procedure between both 
solutions. 

The best solution of the path is added to P if it is both strictly better than one 
of P and is not already present in P. The new solution replaces the solution of P 
which is the most different from itself while being worse. 

Algorithm 10.3 provides the GRASP-PR framework. Code 10.7 implements a 
GRASP-PR method for the TSP. The reader interested in recent GRASP-based 
optimization tools can find extensive information in the recent book of Resende 
and Ribeiro [10]. 


Algorithm 10.3: GRASP-PR framework 


Input: GRASP procedure (with local search LS and parameter 0 < @ < 1), parameters 
Tmax and u 
Result: Population P of solutions 
1P+@ 
2 while |P| < u do 
3 s — GRASP(a,LS) 
4 if s ¢ P then 


5 | PePUs 
6 for Imax iterations do 
7 s — GRASP(a@,LS) 
8 Randomly draw s’ € P 
9 Apply a path relinking method between s and s^; identifying the best solution s" of the 
path 
10 if s" ¢ P and s" is strictly better than a solution of P then 


11 L s" replaces the most different solution of P which is worse than s" 
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Code 10.7 tsp GRASP PR.py GRASP with path relinking implementation for the TSP 


from random_generators import unif Listing 12. 


from tsp utilities import x Listing 12. 
from tsp GRASP import tsp GRASP Listing 7. 
from tsp_path_relinking import tsp_path_relinking Listing 10. 


1HHHHHHHHE GRASP with path relinking for the TSP 

def tsp GRASP PR (d, 4 Distance matrix 
iterations, 4 Number of calls to GRASP 
population size, 4 Size of the population 
alpha): # GRASP parameter 


len(d[0]) 
population = [[-1] * population size for _ in range(population size)] 
pop size - iteration - O0 
lengths = [-1] « population size 
while (pop size « population size and iteration « iterations): 
tour, tour length = tsp GRASP (d, alpha) 
iteration += 1 
succ = tsp tour to succ (tour) 
different - True 
for i in range(pop size - 1): 
if tsp compare (population[i], succ) -- 0: 
different - False 
break 4 The tour is already in population 
if different: 
population[pop size] - succ[:] 
lengths [pop size] = tour length 
pop size += 1 
if (iteration == iterations) :#Unable to generate enough different solutions 
population size - pop size 
for it in range(iteration, iterations): 
tour, tour length = tsp GRASP (d, alpha) 
iteration += 1 
Succ - tsp tour to succ (tour) 
successors, length - tsp path relinking (d, 
population [unif (0,population size-1)],tour length, succ) 
max difference, replacing - -1, -1 
for i in range(population size): 
if (length <= lengths [il): 
difference - tsp compare (population[i], successors) 
if difference -- 
max difference = 0 
break 
if difference > max difference and length < lengths [i]: 
max difference - difference 
replacing - i 
if max difference » O0: 
lengths [replacing] = length 
population[replacing] = successors[:] 
print('GRASP PR population updated:', it, length) 


best = 0 
for i in range(1, population size): 
if lengths[i] « lengths [best]: 
best - 
return tsp succ to tour (population [best]), lengths [best] 
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10.7 Fixed Set Search 


The Fixed Set Search method [4] (FSS) also incorporates several mechanisms 
that are discussed in this book. First, a population of solutions is generated using 
a standard GRASP procedure. Then, this population is gradually improved by 
applying a GRASP procedure guided by a learning mechanism. The latter can 
be seen as a vocabulary building: one randomly selects a few solutions from the 
population and calculates the frequency of occurrence of the elements constituting 
these solutions. Then, another solution is randomly selected from the population. 
Among the elements constituting this solution, a fixed number are retained, 
determined by those which have the highest frequency of occurrence previously 
calculated. The randomized greedy construction is modified so that it produces a 
solution containing all the fixed elements. 

In the case of the TSP, these elements form sub-paths. A step in the randomized 
construction adds either an edge connecting a city not in the selected sub-paths or 
all the edges of a fixed sub-path. The tour thus constructed is improved by a local 
search and enriches the population of solutions. 

The FSS method has several parameters: a stopping criterion (e.g., a number 
of iterations without improvement of the best solution), the number of solutions 
selected to determine the fixed set, the number of elements of the fixed set (which 
can vary from one iteration to another), and the o parameter of the randomized 
construction. 

Another way of looking at FSS is to see it as an LNS-type method (Section 6.4.1) 
with learning mechanisms: the acceptance method manages a population of solu- 
tions. The destruction method chooses a random solution from the population and 
relaxes the elements that do not appear frequently in a random sample of solutions 
from the population. 


10.8 Particle Swarm 


Particle swarms are a bit special because they were first designed for continuous 
optimization. The idea is to evolve a population of particles. Their position 
represents a solution to the problem expressed as a vector of real numbers. The 
particles interact with each other. Each has a velocity in addition to its position and 
is attracted or repelled by the other particles. 

This type of method, proposed by Kennedy and Eberhart [5], simulates the 
behavior of animals living in swarms, such as birds, insects, or fish, which adopt a 
behavior that favors their survival, whether it be to feed, defend themselves against 
predators, or undertake a migration. Each individual in the swarm is influenced by 
those nearby and possibly by a leader. 

Translated into optimization terms, each particle represents a solution to the 
problem whose quality is measured by a fitness function. A particle moves at 
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Fig. 10.14 Swarm particles velocity and position update 


a certain speed in a given direction, but it is deflected by its environment: if 
there is another particle-solution of better quality in the vicinity, it is attracted in 
its direction. In that manner, each solution enumerated by the algorithm can be 
associated with the vertex of a graph. The edges of this graph correspond to particles 
that influence each other. 

There are various variants of particle swarm methods, differing in the influence 
graph and the formulae used to calculate the deviations in particle velocity v. In 
its most classic version, a particle p is influenced by only two solutions: the global 
best solution g found by the set of particles and the best solution mp it has found 
itself. The new velocity of the particle is a vector. Each component is modified 
with weights randomly drawn between 0 and 4$ in the direction of m; and drawn 
between 0 and 4$» in the direction of d. where $, and $» are parameters of the 
method. In addition, a particle is given an inertia w as a parameter. Figure 10.14 
illustrates the update of the velocity and the position of a particle. Algorithm 10.4 
provides a simple particle swarm framework. 

Various modifications have been proposed to this basic version. For instance, 
instead of being influenced by the best solution it has found itself, a particle 
is influenced by the best solution found by particles in its neighborhood. It is 
then necessary to define according to which topology the latter is chosen. A 
common variant is to constrain the velocity of the particles to remain between 
two bounds, Vmin and Vmax. This adds two parameters to the framework. Another 
proposed modification is to apply a small random shift to some particles to simulate 
turbulence. 
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Algorithm 10.4: Swarm particle framework. ®) is component-wise multi- 
plication 


Input: Function f : [X min, X max] € R”  R to minimize, parameters U, 0, 1, 62; Imax 
Result: * 


1 f* =0 

2 for p=1...udo 

3 vi — uni] ( X an X, maks Xu X. nin) // Initial particle velocity 
4 Sp — uni] (P min, H fan) // Initial particle position (solution) 
5 nj — s) // Best own position 
6 if f* > f (57) then Update gloal best 

7 F = F) 

8 — 8; 

9 for Ina, iterations do 
10 for p — 1...u do 

n" ui uni] (0, T) 

12 aj unif (0, T) 

13 » — Qv, + guy (Gn; — 5p) + dn &( g — 3p) // Update velocity 
14 Sp — -mat(nin 3p t vj, X max), X min) // Update position 
15 if f (mj) > f (sp) then Update own best 

16 Mp — Sp 

17 if E > f (5p) then Update gloal best 

18 — (5p) 

ü Bon 


10.8.1 Electromagnetic Method 


In the electromagnetic method, a particle induces a force of attraction or repulsion 
on all the others. This force depends on the inverse of the square of the distance 
between the particles, like electrical forces. The direction of the force depends on 
the quality of the solutions. A particle is attracted by a solution that is better than 
itself and repelled by a worse solution. 


10.8.2 Bestiary 


In previous sections, we have only mentioned the basic algorithms, inspired by the 
behavior of social animals, and a variant, inspired by a process of physics. Different 
authors have proposed many metaheuristics whose framework is similar to that of 
Algorithm 10.4. 


226 10 Population Management 


What distinguishes them is essentially the way of initializing the speed and the 
position of the particles (lines 3 and 4) as well as the “magic formulas” for their 
updates (lines 13 and 14). 

These various magical formulas are inspired by the behavior of various animal 
species or in the processes of physics. To name just a few, there are amoeba, bacteria, 
bat, bee, butterfly, cockroaches, cuckoo, electromagnetism, firefly, and mosquito. 
There are various variants of these frameworks, obtained by hybridizing them with 
the key components of the metaheuristics discussed in this book. There are hundreds 
of proposals in the literature suggesting “new” metaheuristics inspired by various 
metaphors, sometimes even referring to the behavior of mythic creatures! 

Very schematically, it is a matter of applying the intensification and diversifica- 
tion principles: elimination of certain solutions from the population, concentration 
toward the best discovered solutions, random walk, etc. 

A number of these frameworks have been proposed in the context of continuous 
optimization. To adapt these methods to discrete optimization, one can implement a 
coding scheme, for example, the random keys seen in Section 10.5. Another solution 
is to consider the notion of neighborhood and path relinking. The reader who is a 
friend of animals and other creatures may consult [7] for a bestiary overview. 

Rather than trying to devise a new heuristic based on an exotic metaphor using 
obscure terminology, we encourage the reader to use a standardized description, 
following the basic principles presented in this book. Indeed, during the last quarter 
century, there have been few truly innovative new concepts. It is a matter of adopting 
a more scientific posture, of justifying the choices of problem modeling, of estab- 
lishing test protocols, etc., even if the development of a theory of metaheuristics still 
seems very far away and the heuristic solution of real-world optimization problems 
remains the only option. 


Problems 


10.1 Genetic Algorithm for a One-Dimensional Function 

We need to optimize a function f of an integer variable x, 0 < x « 2". In the 
context of a genetic algorithm with a standard crossover operator, how to encode x 
in the form of a binary vector? 


10.2 Inversion Sequence 

A permutation p of elements from 1 to n can be represented by an inversion 
sequence s, where s; counts the number of elements of p1,..., py = i that are 
greater than i. For example, the permutation p — (2, 4, 6, 1, 5, 3) has the inversion 
sequence s = (3,0,3,0, 1, 0): there are three elements greater than 1 before 1, 
0 elements greater than 2 before 2, etc. To which permutations do the inversion 
sequences (4, 2, 3, 0, 1, 0) and (0, 0, 3, 1, 2, 0) correspond? Provide necessary and 
sufficient conditions for a vector s to be an inversion sequence corresponding to a 
permutation. Can the standard 1-point, 2-point, and uniform crossover operators be 
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applied to inversion sequences? How can inversion sequences be used in the context 
of scatter search? 


10.3 Rank Based Selection 
What is the probability of the function rank based selection (m), given in 
Algorithm 10.1, to return a given value v? 


10.4 Tuning a Genetic Algorithm 
Adjust the population size and mutation rate of the procedure tsp GA given by 
Code 10.5, if it generates a total of 5n children. 


10.5 Scatter Search for the Knapsack Problem 

Consider the knapsack instance 10.1 of Section 10.4. Perform the first iteration of 
a scatter search for this instance: generate the new population, repair/improve the 
solutions, update the reference set consisting of five solutions with three elites. 
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Chapter 11 A 
Heuristics Design gsti 


This chapter gives some tips for developing heuristics. It goes back to the modeling 
of the problem and gives an example of decomposing the problem into a chain 
of sub-problems that are easier to treat. It then proposes an approach to designing 
a specific heuristic. Finally, techniques for parameter tuning and comparison of 
algorithms are discussed. 


11.1 Problem Modeling 


Now that we have reviewed the key ingredients of heuristics, let us try to propose an 
approach to design one. The first thing to determine is whether a heuristic approach 
is absolutely necessary. Indeed, the “no free lunch” theorem [5] informs us that no 
optimization heuristics outperforms all others! This result follows from the fact that 
in the infinite variety of instance data, most of them have no exploitable structure. 
Any heuristic, no matter how sophisticated, selects therefore an unfortunate choice 
for a given data set. Among the infinite number of imaginable heuristics, there is at 
least one that does not include this inappropriate choice. 

If one has to solve a concrete optimization problem, it is therefore necessary to 
“cheat.” Examples are the set bipartition and the knapsack problems discussed in 
Sect. 1.2.3.5. If we know that the data do not contain large values, it is useless to 
design a heuristic because we can solve this type of problem exactly by dynamic 
programming. 

The way in which the problem is modeled is crucial to its successful resolution. 
Especially when dealing with a concrete logistics problem, it can be time-consuming 
and tedious to capture all the wishes of a manager who is not used to the narrow view 
of a optimization engineer, who thinks in terms of variables, objective function, and 
constraints. 
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To do this, one must first identify the variables of the problem. One must 
determine what can be modified and what is intangible and part of the data. Then, the 
constraints must be discussed, specifying those that are hard and must be respected 
for a solution to be operational. Often, constraints presented as indispensable 
correspond rather to good practices, from which it is possible to depart from time 
to time. These soft constraints are generally integrated into an objective with a 
penalty factor. As we have seen with the Lagrangian relaxation technique, hard 
constraints can also be introduced into an objective, but probably with a higher 
penalty weighting. Finally, in practice, there are several objectives to be optimized. 
However, a manager may not be very happy if provided with a huge set of Pareto 
optimal solutions (see Problem 11.1) and has to examine all of them before choosing 
one. On the other hand, it will be easier to prioritize the objectives. It remains to 
be seen whether these objectives should be treated in a hierarchical manner (the 
optimum of the highest priority objective is sought before optimizing the next 
one) or by scalarization (all the objectives are aggregated into one, with weights 
in relation to their priority). 

Once the problem has been properly identified, the designer of a heuristic 
algorithm must choose a model that is appropriate for the solution method. The 
following section illustrates two remarkably similar models of the same problem 
that can lead to the design of very different algorithms. 


11.1.1 Model Choice 


To reconstruct an unknown genetic sequence, a DNA microarray chip, able to 
react to all k-nucleotide sequences, is exposed to the gene to be discovered. Once 
revealed, this chip allows knowing all subsequences of k-nucleotides present in the 
gene to be analysed. 

The data can be modeled using de Bruijn graphs. These graphs represent the 
superposition of symbol chains. 

A first model associates an arc with each k-nucleotide detected. So, the 3- 
nucleotide AAC is represented by an arc connecting the vertices AA — AC, due to 
the middle A superposition. If m is the number of k nucleotides detected, we have a 
graph with m edges. The reconstruction problem is to find an Eulerian path (passing 
through all the arcs) in this graph. This problem is easy, it can be solved in linear 
time. 

The other model associates a vertex with each k-nucleotide detected. An arc 
connects two vertices if the associated k-nucleotides have a common subsequence 
of k — 1 nucleotides. For instance, if the 3-nucleotides AAC, ACA and ACG are 
detected, then both arcs AAC — ACA and AAC — ACG are present in the 
graph due to the common AC superposition. The graph is a directed version of the 
line graph of the previous representation. The reconstruction problem is to discover 
a Hamiltonian path (passing once through all the vertices). This second modeling 
thus requires the resolution of an NP-complete problem. 


11.1 Problem Modeling 231 


ACGCAAACACTTA 


ACACGCAAACTTA 


Fig. 11.1 Two de Bruijn graph models for the reconstruction of a genetic sequence. Top right, a 
graph in which an arc connecting two k — 1-nucleotides represents a detected k-nucleotide (with 
k — 3). With this model, we have to find a Euclidean path in the graph. The numbering of the 
arcs corresponds to one of the shortest possible sequences. Bottom left is a graph in which the 
vertices represent the detected k-nucleotides. An arc represents a k + 1-nucleotide that could be in 
the sequence. With this model, we have to discover a Hamiltonian path. The colored arcs provide 
the path of the other shortest sequence 


That said, this second model is not necessarily to be discarded in practice. Indeed, 
à concrete sequencing problem is likely to possess peculiarities and constraints that 
might be more difficult to deal with using the first model. For example, a genetic 
sequence may include repeated subsequences, a more or less reliable quantification 
of the number of times a k-nucleotide appears, etc. 

Figure 11.1 shows the graphs that can be constructed with these two models for 
a gene that has activated 11 subsequences of 3-nucleotides. In this example, it is not 
possible to unambiguously reconstruct the gene. 

The choice of a model is often sensitive and depends on the techniques to be 
implemented. During the design of a heuristic, it is frequent to realize that another 
model is more appropriate. In any case, it is noteworthy to keep in mind that good 
solutions are located at the boundary between feasible and unfeasible ones. To 
highlight the point, the 2-opt neighborhood for the TSP is restricted to examining 
feasible solutions. Despite constituting the primary operations of the Lin-Kernighan 
neighborhood, it is much less efficient. The success of the latter is undoubtedly due 
to the fact that one examines reference structures that are not tours. In a way, feasible 
solutions are approached from outside the domain of definition. 

When a few constraints are numerical, the possibility of implementing the 
Lagrangian relaxation technique discussed in Sect. 2.8 should be studied. By 
adjusting the value of the penalties associated with the violation of the relaxed 


232 11 Heuristics Design 


constraints, the heuristic focuses its search in the vicinity of the boundary of feasible 
solutions. 

Conversely, artificial constraints can also be added to implement the diversifi- 
cation technique outlined in Sect. 9.2. For example, the quality of the solutions 
generated by Algorithms 2.7 and 2.8 for clustering can be significantly improved 
by adding a soft constraint imposing that the groups must contain the same number 
of elements. This constraint is relaxed and introduced in the fitness function with a 
penalty parameter that decreases during the iterations, just like the temperature of 
a simulated annealing. Thus, by performing a few more iterations than the original 
method, the solution is perturbed by making the largest groups smaller and the very 
small groups larger. At the end of the algorithm, the penalties being very low, we 
return to the initial objective, but the centers have managed to better relocate. The 
local optimum thus obtained is considerably better than that of the original method 
which does not move significantly enough from the initial random solution. 


11.1.2 Decomposition into a Series of Sub-problems 


Another step in modeling is to assess whether it is possible to apply the divide 
and conquer principle. Rather than trying to design a complex problem model with 
multiple interrelationships, one can try breaking it down into a series of more easily 
addressed sub-problems. 

An example is the vehicle routing problem and its extension to several ware- 
houses that need to be positioned. Rather than solving the positioning of the 
warehouses and the construction of the routes simultaneously, one can initially only 
deal with the customers constituting natural groups. The creation of the routes can be 
done in a second step and then finally the positioning of the warehouses. Figure 11.2 
illustrates the process of this decomposition into a succession of sub-problems. 
With this approach, it was possible to handle examples with millions of elements 
with a reasonable computational effort. At the time these results were obtained, the 
instances in the literature were 1000 or 10,000 times smaller. 


11.2 Algorithmic Construction 


Once the problem has been modeled and a fitness function has been chosen, the 
construction of an algorithm can start. The first step is to construct a solution. 
Chapter 4 suggests diverse ideas to realize this step. Beforehand, if the instance 
size is significant, it is necessary to consider another problem partitioning, by the 
data. 
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(a) (b) (c) 


Fig. 11.2. Decomposition of the location-routing problem into a sequence of sub-problems. (a) 
Generation of customer clusters with a rapid clustering method. (b) Generation of routes for each 
customer cluster. (c) Positioning of warehouses and re-optimization of routes 


11.2.1 Data Slicing 


A partition of the problem items should be considered if the volume of data is very 
large or if a procedure of high algorithmic complexity is to be implemented. For 
the location-routing problem discussed in the previous section, a proximity graph 
presented in Sect. 6.3.1 was used. The size of the clusters created was chosen 
according to the problem data, so that their volume was close to a (small) multiple 
of the volume of the vehicles. The routing heuristic has also been designed in 
anticipation of optimizing a few tours at a time. It may therefore be appropriate 
to slice the data even for relatively small instances. 


11.2.2 Local Search Design 


Purely constructive heuristic algorithms rarely produce solutions of acceptable 
quality for difficult optimization problems. Even when combined with learning 
methods such as pheromone trails in artificial ant colonies or population manage- 
ment with genetic algorithms, convergence can be slow and the resulting solutions 
of insufficient quality. However, some software libraries can automatically generate 
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a genetic algorithm with only the decoding of a solution from a binary vector and 
the calculation of the fitness function. In certain situations, the coding phase of the 
algorithm can be significantly reduced. 

Generally, it is essential to move on to the subsequent phase and devise a 
neighborhood for the problem to be solved. The success of a heuristic algorithm 
frequently depends on the design of the envisaged neighborhood(s). Metaheuristics 
are sometimes described as processes guiding a local search. This explains the 
relative length of Chap. 5 and the richness of the various techniques allowing 
their limitation or extension. However, we are consciously aware that not everyone 
possesses the genius of Lin and Kernighan to design such an efficient ejection chain 
for the traveling salesman problem. Perhaps the latter is an exception, as it does not 
need to rely on other concepts to achieve excellent quality solutions. 

As it is usually not possible to find a neighborhood with all the appropriate 
characteristics (connectivity, small diameter, fast evaluation, etc.), a local search 
often uses several different neighborhoods. Each of them corrects a weakness of the 
others. 

This implies thinking about strategies for their use: one has to decide whether 
the local search should evaluate all these neighborhoods at each iteration or 
whether one should alternate phases of intensification, using one neighborhood, and 
diversification of the search, using other neighborhoods. 


11.3 Heuristics Tuning 


Theoretically, a programmer who is not an expert on the problem to be solved 
should be able to design a heuristic based on the key principles of metaheuristics 
discussed in the previous chapters. In practice, during the algorithm development, 
the programmer is going to gain some experience on how to achieve good solutions 
for the specific type of data to be processed. 

Indeed, the most time-consuming work in the design of a heuristic algorithm 
consists in trying to understand why the constructive method goes wrong and 
produces outliers, why the “genial” neighborhood does not give the predicted results 
or why the “infallible” learning method fails. . . 


11.3.1 Instance Selection 


The design of a heuristic algorithm begins with the selection of the problem 
instances that it should be able to successfully tackle. Indeed, the “no free lunch” 
theorem tells us it is an illusion to try to design a universally successful method. 
When addressing an industrial problem, once we have managed to translate 
the customer's wishes into a model that seems reasonable, we still need to obtain 
concrete numerical data. This sometimes problematic phase can require a dispro- 
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portionate investment, especially if the model developed is too complex, with poor 
decomposition and not sufficiently focused on the core of the problem. In practice, 
it frequently occurs that constraints described as essential are not imperative but 
result from a fear of altering too drastically the current operating mode and habits. 
Conversely, the first feedback from solutions provided by a heuristic algorithm may 
also reveal constraints that have not been explicitly stated. In this case, it is necessary 
to go back to the modeling stage and repeat an iteration... 

For an academic problem, there are usually libraries with many numerical data. 
In this case, a selection of the instances must be considered so as not to invest an 
infinite amount of time tuning the algorithm. To evaluate the proper implementation 
of the developed heuristic, a first selection should consider moderate size instances 
for which an optimal or a very good solution is identified. 

The instance selection should also be able to highlight the pathological cases for 
the developed heuristic. This choice must also be governed by the interest of these 
examples for practical cases. One example is the case of the satisfiability problems 
with 3 literals per clause (3SAT). If the number of randomly generated clauses is 
significant compared to the number of variables, then the instances are easy: the 
probability that there is a feasible assignment of the variables tends very quickly 
to 0. Conversely, if the number of clauses is limited compared to the number of 
variables, then the examples are equally easy: there is a probability tending very 
quickly toward | that there is a feasible assignment. It was determined that the 
transition between intractable and simple instances occurs when the number of 
clauses is 4.24 times higher than the number of variables. This result is interesting 
in itself, and while an efficient heuristic is developed for this type of instances, it 
does not guarantee it will be efficient for practical applications. 

Finally, results for problem instances with very diverse characteristics should be 
separately reported. Indeed, multiplying the number of favorable (e.g., very small) 
or unfavorable instances would lead to biased results. 


11.3.2 Graphical Representation 


When possible, a graphical representation of the solutions helps to perceive how 
a heuristic works and to correct its flaws. Indeed, it is frequent to imagine bad 
causes explaining poor results. By visualizing the output of the method as something 
other than a series of numbers, it is sometimes very simple to explain this poor 
performance. 

For some problems, a graphical representation is natural, as for the Euclidean 
TSP. This certainly explains the excellent efficiency of the heuristic and exact 
methods that have been developed for this problem. 
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11.3.3 Parameter and Option Tuning 


The most time-consuming step in the design of a heuristic is the selection of its 
ingredients and parameter tuning. When developing a heuristic, one desires it to 
provide results of the highest possible quality on as wide an instance range as 
possible. 

Initially, the programmer can proceed intuitively to find parameter values that 
fulfil this purpose. Typically, a small instance is chosen and the heuristic is executed 
by varying parameter values or changing options. The most promising evolutions 
are favored. Put differently, the tuning consists in applying a heuristic to another 
problem whose variables are the parameter values and whose objective function is 
the result of the run of the program implementing the algorithm under development. 

In principle, the search space for tuning variables is considerably smaller than 
that of the instance to be solved. If not, the question arises as to the relevance of a 
heuristic whose design could be more complicated than the problem to be solved. 

Several illustrations in this book provide the results of extensive experiments 
on the influence of the value of one or two parameters for some heuristics. For 
example, Fig. 9.3 shows that for a TSP instance with n — 127 cities, the appropriate 
combinations of the parameters dmin and A seem to be such that A + 2dmin = n, 
provided that one performs 107 iterations of the taboo search. 

These results are not intended to provide definitive values. They are presented so 
that the reader can get an idea of the appropriate values, but they are not necessarily 
generalizable. The production of such a figure requires a disproportionate effort 
(more than 10,000 executions of the heuristic and then production of the diagram) 
compared to the information that can be obtained. However, it does allow us to 
observe significant random fluctuations in the results obtained. 

If the heuristic has more than half a dozen parameters and options, a rough 
intuitive tuning is likely to be biased: 


* Given the effort involved, few alternatives are tested. 

* The instance set is limited. 

* The heuristic only works well on a limited instance set. 

* Outstanding or bad results focus attention. 

* Results are neither reproducible nor statistically supported. 


It is therefore recommended to use automated methods to calibrate the param- 
eters, providing these methods with a sufficiently representative instance set. They 
have the advantage of not being subjective, focusing on a set that is very favorable 
or leaving out a very unfavorable instance. As a parameter adjustment software, we 
can quote, among others, iRace proposed by [2]. 
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11.3.4 Measure Criterion 


The design of heuristics is habitually a multi-objective process. Indeed, the vast 
majority of the framework algorithms discussed in Part III of this book include 
a parameter that directly influences the number of repetitions of a general loop. 
Consequently, one can choose the computational effort to solve a problem quite 
freely. Furthermore, the quality of the solution produced depends directly on this 
effort. An extreme case is given by simulated annealing, which almost certainly 
produces the best possible solution, provided that one accepts an infinite effort! 

A compromise must therefore be achieved between the computational time and 
the quality of the solutions produced. 


11.3.4.1 Success Rate 


A first measure of the quality of a heuristic is its success rate in producing target 
solutions. These may be the optimum, if known, or solutions of a given quality. If 
the value of the optimum is unknown, a bound can be derived using a relaxation, 
from which a certain deviation can be accepted. 

The simplest case for comparing success rates occurs when the heuristics have 
no parameters, or when the parameters have been fixed. In this case, we want to 
answer the question: does heuristic A find more target solutions than heuristic B? 
The answer to this question is univocal: we run .A and B on the same set of instances 
and we count the number of respective successes. Obviously, for this to make sense, 
the instances must be chosen prior to the experiment, and not according to the results 
obtained by one or the other method. 

As in any experiment of this type, a subsidiary question must be answered: is the 
Observed difference in success rates significant? Indeed, if the heuristics include a 
random component, or if the instance set is randomly selected, the difference could 
be due to chance and not to a distinct solving performance between the heuristics. 

In this case, a statistical test can be carried out, with the null hypothesis that both 
methods have exactly the same probability p of success [4]. To conduct such a test, 
the independence of the experiments should be guaranteed. Under these conditions, 
relatively few numerical experiments can reveal a significant difference. 

Table 11.1 provides the values for which it can be stated with 9996 confidence 
that one proportion is significantly higher than another. This table can be used as 
follows: suppose we want to compare the A and 6 methods on instances drawn at 
random (e.g., TSPs with 100 cities uniformly generated in the unit square). Suppose 
that the 5 method was able to find a solution of given quality only twice on np = 5 
runs. Suppose that, out of na = 10 runs of the method A, 9 were able to achieve 
such quality. In the corresponding row and column of Table 11.1, we find the couple 
(10, 2). In other words, a proportion of at least 10/10 should have been reached to 
conclude that the A method is superior to the 5 method. 
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Fig. 11.3 Success rate for the optimal resolution of the TSP instance tsp225 by three iterative 
methods. The learning processes of the FANT (Code 8.3) and GRASP-PR (Code 10.7) methods 
demonstrate a reasonable efficiency. Indeed, independent runs of Code 12.3, starting from 
randomly generated solutions, present a much lower success rate 


Put differently, there is a probability p of success such that in more than 1% of 
cases, we can observe nine successes out of ten or two successes out of five. This 
result is counterintuitive, as the observed success rates vary by a factor larger than 
two. 

The situation becomes more complex if the success rate depends on the 
computational effort involved to reach a target solution. One possibility is to plot 
the results as a proportion of success versus effort (time-to-target plot or TTT-plot). 
Figure 11.3 illustrates this for three different heuristics with a small TSP instance. 
For this figure, the reference for an iteration represents a call to Code 12.3. The 
generation time of the starting solution has been neglected here. The last is generated 
either in a purely random way, or with an artificial pheromone matrix, or with a 
randomized greedy algorithm. We also ignore that the local search may take more 
or less time to complete, depending on the starting solution. 

The success rate curve for the method repeating local searches from randomly 
generated solutions was obtained by estimating the probability of a successful run. 
This estimation required 100,000 executions of the method, with only 14 achieving 
the target. 

The success rate of x executions was therefore estimated to be 1 — (1—0.00014)*. 
However, this mode of representation is questionable, as the target value and the 
instance chosen can greatly influence the results. Moreover, the compared methods 
should require an approximately identical computational effort for each iteration. 
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11.3.4.2 Computational Time Measure 


Whenever possible, one should always favor an absolute measure of the computa- 
tional effort, for example, by counting a number of iterations. Obviously, one must 
specify what can influence the computational effort, typically the size of the data 
to be processed. The algorithmic complexity of an iteration should therefore be 
indicated. 

This complexity is sometimes not clearly identifiable, or its theoretical expres- 
sion has nothing to do with practical observations. The simplex algorithm for linear 
programming has already been mentioned, which can theoretically perform an 
exponential number of pivots, but in practice stops after an almost linear number 
of steps. 

To compare the speed of heuristics, one is sometimes forced to use a relative 
measure, the computational time. For the same algorithm using the same data 
structures with the same algorithmic complexity, the computation time depends on 
many factors, among which: 


* The programming language used for its implementation 
* The hardware (processor, memory, etc.) 

* The programming style 

* The interpreter or compiler 

* The interpretation or compilation options 

* The operating system 

* Running the system in energy-saving mode 

* Other independent processes running in parallel 

* The BIOS configuration 


Achieving reliable computing times can represent a challenge. For example, 
the motherboards of personal computers are often configured from the factory 
to run in "turbo" mode. In practice, when the processor is not heavily used, the 
clock frequency drops, which reduces energy consumption. When starting intensive 
computations, the first iterations can therefore take much longer than the following 
ones, although they perform the same number of operations. The maximum clock 
frequency may depend on the fact that a laptop works on battery or with an external 
power supply. 

Thus, a factor of 2 can indeed be observed for the execution of a procedure on 
two machines with the same hardware (or even on the same machine). The factor 
can rise to more than 100 if we compare two similar implementations but not using 
the same programming language. 

To obtain meaningful results, it is frequently necessary to repeat runs for the 
same set of parameters if the measured times are less than 1 second. In all cases, 
it should be kept in mind that the computational time remains a relative measure. 
What is important is the evolution of time according to the characteristics of the 
problems being solved. One essential characteristic is the data size. 

In Fig. 11.4, we have plotted the running time of some codes proposed in this 
book as a function of the number of cities in the problem. This figure uses two 


242 11 Heuristics Design 


104 
[s] + 
+ 
3 v 
10 e v <n 
e M 
ew 4 pat yy 
102 A Le | [>] 
* v A Pas 
ol v a 541" 
* v bat M 
A 4 M 
10 e 4 — E 
v 4 "6 x 
* v phy iX 
1 * A 4 xx 
e wy, ,H ,M x 
we it 
+ vY A a we” x 
0.1 e v is "d Qr 
e v pare x 
vet uh xX | @ Pilot O(n? 87) 
102 x 


A 2-opt best O(n?4) 

> Lin-Kernighan O(n?9^) 
107° <LK from NN O(n?) 
> 2-opt first O(n?99) 
X Nearest neighbor O(n?0" 


102 10? 104 10 n 


1074 


Fig. 11.4 Evolution of the computational time as a function of the number of TSP cities for some 
codes presented in this book 


logarithmic scales. In this way, a polynomial growth of the computational time is 
represented, asymptotically, by a straight line. The slope of this line indicates the 
degree of the polynomial. It can be noted that all methods behave polynomially. 

The reader who wishes to reproduce this figure should be able to do so without 
too much difficulty using the codes provided. The degree of the polynomials 
observed is likely to be close to that presented, but the time scale may be very 
different depending on the programming language and the configuration of the 
computer used. 


11.3.4.3 Solution Quality Measure 


Comparing the quality of solutions produced by fully determined heuristics (with no 
free parameters) is relatively simple. Each heuristic is run on one or more instance 
sets with similar characteristics, and the value of the objective functions produced 
is recorded. The most standard measure is certainly the average. However, to know 
if the average of the values produced by two heuristics is significantly different 
requires a statistical test. If we can reasonably assume that the values produced are 
normally distributed, there are standard tests that are relatively simple to implement, 
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such as student's f test. If there are more than two methods to be compared, a more 
elaborate test, typically an analysis of variance (ANOVA), must be conducted. 

If the values produced do not follow a normal distribution, particular caution 
should be observed, as a single measure can significantly alter an average. In this 
case, it is more reliable to use another measure, for example, the median, with 
Friedman's analysis of variance. With this test, a rank is assigned for each run of 
a method and the null hypothesis states that all samples are from a population with 
the same median. 

Bootstrapping is a very general statistical technique that is fairly simple to 
implement and is particularly suitable for comparing methods for which it is not 
reasonable to obtain a large number of runs. The estimation of a quantity such as 
the mean, median, or their confidence interval is done by drawing a large number of 
samples from the small number of observations made. The quantity to be estimated 
for each of these samples is calculated, and the mean of the samples provides an 
estimator of this quantity. To obtain a confidence interval, it is sufficient to identify 
the quantiles of the resampling distribution. 

When the heuristics are not completely determined, for example, if one wishes 
to provide a more complete picture of the evolution of the results as a function of 
the computational effort, the statistical tests mentioned above must be repeated for 
each computational effort. There are convenient tools to perform this automatically 
and provide diagrams with confidence intervals. 

Figure 11.5 illustrates the evolution of the objective function for the same 
methods as in Fig. 11.3. As all three methods were run on the same machine, it 
is possible to compare them on the basis of computational time. This figure gives a 
double scale for the abscissa—Computational time, number of calls to the descent 
to a local optimum. For this diagram, the reference scale is time. The iteration scale 
refers to the first method, GRASP-PR. 

This allows observing an increase in time of a few percent for the execution 
of the path relinking method and a decrease for the fast ant system, because the 
solutions generated with the pheromone trails are closer to local optima, which 
speed up the descent method. This diagram presents a significantly different insight 
into the behavior of these methods. Indeed, a misinterpretation of Fig. 11.3 would 
suggest that up to 1000 iterations, the FANT method is better than GRASP-PR and, 
beyond that, the latter is the best. Repeating the improvement methods from random 
solutions is much worse. 

Figure 11.5 shows that up to about 50 iterations, the 3 methods do not produce 
solutions of statistically different value. Only from 300 iterations onward can we 
clearly state that multiple descents are less efficient than GRASP-PR. The curves 
of the latter two methods cross at several points, but it is not possible to state that 
one produces solutions with a significantly lower average value than the other for a 
number of iterations less than 300. For more details on the generation of this type 
of diagrams, the reader can refer to [3] and to [1] for bootstrapping techniques. 
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Fig. 11.5 Evolution of the average tour length as a function of the number of iterations for 
some codes presented in this book. Each method was run 20 times independently on the tsp225 
instance. The shaded areas give the 95% confidence interval of the mean, obtained by exploiting a 
resampling technique 


Problems 


11.1 Data Structure Options for Multi-objective Optimization 

It is argued in Sect. 5.5.3 that using a simple linked list to store the Pareto set may 
be inefficient. Is the more complicated KD-tree implementation really justified? To 
answer this question, evaluate the number of solutions produced by the Pareto local 
search Code 12.8, as well as the number of times one has to compare a neighbor 
solution to one of the solutions stored in this set. Deleting an element from a KD- 
tree can also be costly as a whole subtree has to be examined, and this can lead to 
cascading deletions. With a linked list, deleting a given element is done in constant 
time. Also assess the extra work involved. 


11.2 Comparison of a True Simulated Annealing and a kind of SA with 
Systematic Neighborhood Evaluation 

Compare the simulated annealing Code 7.1 and the noising method Code 7.2 
when executed under the following conditions: instance with 50 cities and random 
distances generated uniformly between 1 and 99 (call to rand sym matrix 
function); start with a random solution (rand permutation function); 
initial temperature: tour length/50; and final temperature: tour length/2500; a = 
0.999. 
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Chapter 12 A) 
Codes E 


This appendix provides the codes of utility procedures (random number genera- 
tion, TSP data structures, KD-tree) appearing in various algorithms discussed in this 
book. Then, several codes are provided for testing complete metaheuristics. 

These codes have been simplified in such a way that a user who is not familiar 
with coding or the use of program libraries can quickly execute them. Their 
programming style is therefore not so exemplary! 


12.1 Random Numbers 


Code 12.1 random_generators.py Implementation of a pseudo-random generator due to 
l’Ecuyer [1] as well as utility functions to generate integers between two bounds, random 
permutations and random symmetric matrices 


1| 4HHHHHHHHE L'Ecuyer random generator; retuns a value in ]0; 1[ 


m, m2 - 2147483647, 2145483479 


4 al2, a13, a21, a23 - 63308, -183326, 86098, -539608 
5 q12, q13, q21, q23 = 33921, 11714, 24919, 3976 

6 r12, r13, r21, r23 - 12979, 2883, 7417, 2071 

] invm = 4.656612873077393e-10 

8 h = rando.x10 // q13 

9 p13 = -a13 + (rando.x10 - h « q13) - b + r13 
10 h = rando.x11 // q12 
11 pl2 = al2 « (rando.xir - h « q12) - h « r12 


12 Vf pls: « 0: pls: = p13. +m 
13 iE piz « 0: pla s pl2 +m 
14 rando.x10, rando, x11, rando.x12 = rando.xll, rando.x12, p12 - p13 


15 if rando.x12 « 0: rando.x12 = rando.x12 + m 
16 h = rando.x20 // q23 
17 p23 = -a23 « (rando.x20 - h « q23) - h + r23 


Electronic Supplementary Material The online version of this article (https://doi.org/10.1007/ 
978-3-031-13714-3 12) contains supplementary material, which is available to authorized users. 
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18 b = rando.x22 // q21 

19 p21 = a21 æ (rando.x22 = h w q21) - h w T21 
20 if p23 « 0: p23 = p23 + m2 

21 1E p21 < 0; pal = p21 + m2 


22 rando.x20, rando.x21, rando.x22 = rando.x21, rando.x22, p21 - p23 
23 if rando.x22 < 0: rando.x22 = rando.x22 + m2 

24 if rando.x12 « rando.x22: h = rando.x12 - rando.x22 « m 

25 else: h - rando.x12 - rando.x22 

26 if h == 0: return 0:5 


27 else: return h «x invm 


29| rando.x10, rando.x11, rando.x12 12345, 67890, 13579 
30| rando.x20, rando.x21, rando.x22 = 24680, 98765, 43210 


i 


32| #HHHHHH# Returns a random integer in [low; high] 
33| def unif (low, high): 
34 return low + int((high - low + 1) « rando() 


36| HHHH#HHH Returns a random permutation of the elements 0...n-1 
37| def rand permutation (n): 

38 p = [i for i in range(n)] 

39 for i in range(n - 1): 

40 random index - unif(i, n - 1) 

41 pli], p[random index] = p[random_index], p[i] 

42 return p 


14| 4HHHHHHHHE Returns a symmetric n x n matrix of random numbers with 0 diagonal 
45| def rand sym matrix(n, low, high): 


46 matrix = [IO for . in range(n)] for _ in range(m)] 

47 for i in range(n-1): 

48 for j in range(i+1, n): 

49 matrix[i][j] = matrix[j] [i] = unif(low, high) 
50 return matrix 


12.2 TSP Utilities 


Code 12.2 tsp utilities.py Utility functions for the traveling salesman: computation of the length 
of a tour when a solution is provided in the form of an array giving the order in which the cities are 
visited; transformation of a solution from one form to another (order, successors, predecessors); 
comparison of tours 


1| FHHHHHHHH Compute the length of a TSP tour 

2| def tsp length(d, # Distance matrix 
tour): # Order of the cities 

4 n = len (tour) 

5 length = d[tour[n - 1]] [tour[0]] 

6 for i in range (n. = 1): 

7 length += d[tour[i]] [tour[i + 1]] 

8 return length 


10| 4HHHHEHHHHE Build solution representation by successors 
ii| def tsp tour to succ (tour): 
12 n - len(tour) 


13 succ = [-1] x n 
14 for i in range(n): 
15 succ [tour [i]] = tour[(i+1) $n] 


16 return succ 


18| FHHHHHHHH Build solution representation by predeccessors 
19| def tsp succ to pred(suco): 
20 n = len(succ) 
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21 pred = [-1] «n 

22 for i in range(n): 

23 pred[succ[i]] = i 
24 return pred 


26| HHHHHHHHH Convert solution from successor of each city to city order 
27| def tsp_succ_to_tour (succ): 

28 n = len(succ) 

29 tour = [-1] « n 

30 a as A) 

31 for i in range(n): 

32 tour] j 

33 j = Succ] 

34 return tour 


36| HHHHHHHHH Convert a solution given by 2-opt data structure to a standard tour 
37| def tsp 2opt data structure to tour (t): 

38 n = int(len(t)/2 + 0.5) 

39 tour = [-1] +» n 

40 jew 

41 for i in range(n): 

42 tour [i] sl 

43 esse 3] 

44 return tour 


46| HHHHHHHHH Compare 2 directed tours; returns the number of different arcs 
47| def tsp compare (succ a, suco b): 


48 n = len(sucec a) 

49 count = 0 

50 for i in range(n): 

51 if succ ali] l= succ bfi]: 
52 count += 1 

53 return count 


12.3 TSP Lin and Kernighan Improvement Procedure 


Code 12.3 tsp_ LK.py Ejection chain for the TSP 


i| from tsp utilities import + 4 Listing 12.2 


HHHHHHHHH Basic Lin & Kernighan improvement procedure for the TSP 


4| def tsp LK(D, # Distance matrix 
5 tour, # Solution 
6 length): # Tour length 
7 n = len(tour) 

8 succ = tsp_tour_to_succ (tour) 

9 for i in range(n): succ[tour[i]] = tour (i+ 1) * nl 

10 tabu = [[0 for _ in range(n)] for _ in range(n)] #Can edge i-j be removed ? 
11 iteration = 0 # Outermost loop counter to identify tabu condition 
12 last-a, a - 0, 0 # Initiate ejection chain from city a = 0 
13 improved = True 

14 while a !- last a or improved: 

15 improved - False 


16 iteration += 1 

17 b = succ [a] 

18 path length = length - D[a] [b] 
19 path modified - True 
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20 while path_modified: # Identify best ref. struct. with edge a-b removed 
21 path_modified = False 


22 ref_struct_cost = length # Cost of reference structure retained 
23 c = best c = succ [b] 

24 while succ[c] != a: # Ejection can be propagated 
25 d = suce [c] 

26 if path length - D[c] [d] + D[c] [a] + D[b] [d] < length: 

27 best.c 2e c 4 An improving solution is identified 
28 ref struct cost = path length - D[cl[d] + DIe] [a] + DIB] [dl 
29 break # Change improving solution immediately 
30 if tabu[c] [d] != iteration and \ 


31 path length + D[b] [d] « ref struct cost: 
32 ref struct cost = path length + D[b] [d] 


33 best c= c 

34 a | # Next value for c and d 
35 if ref struct cost < length: # Admissible reference structure found 
36 path_modified = True 

37 c, d = best c, succ[best_c] # Update reference structure 
38 tabu[c] [d] = tabu[d] [c] = iteration#Don’t remove again edge c-d 
39 path length += (D[b]l[d] - D[c] [d]) 

40 1; Si, suce [b] = b, suec|b], d # Reverse path b -> c 
41 while i != c: 

42 güccisil; i,-gi = i, si; -suce [si] 

43 b=c 

44 

15 if path_length + D[a] [b] < length: # A better solution is found 
46 length = path length + D[a] [b] 

47 succ[a], last a = b; b 

48 improved - True 

49 tour - tsp succ to tour (succ) 

50 Succ - tsp tour to succ (tour) 

51 a = succ [a] 

52 return tour, length 


12.4 KD-Tree Insertion and Inspection 


Code 12.4 kd tree add scan.py Codes to define the general structure of the nodes of a KD-tree, 
to add an element to a KD-tree, and to inspect the whole tree. The inspection procedure just prints 
out the elements 


1| HHHHHHHHE KD tree node data structure 
2| class Node: 


4 def init (self, key, father, info). # Create a tree node 
5 self.key = key[:] # Key used to separate nodes 
6 self.father = father # Father of the node, (None if node is the root) 
7 self.info = info[:] # Information to store in the node (list) 
8 self.left = None # Left son of node 
9 self.right = None # Right son 
10 

14 |; Ke 3 # Define the dimension of the KD tree 
12 

13| THHHHHHHHHHHHHHHHHHHHHHE Add a new node in a KD tree ###H#HHHHHHHHHHHHHHHEHHHHHHE 


i4| def kd tree add (root, # Root of a (sub-) tree 
15 key, 4 Key for splitting nodes 
16 3nfo; # Information to store in the node 


17 depth): # Depth of the root 
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19 if root is None: 

20 root - Node(key, None, info) 

21 elif root.key[depth $ K] « key[depth $ K]: 
22 if root.right is None: 

23 root.right - Node(key, root, info) 


24 else: 

25 root.right = kd tree add(root.right, key, info, depth + 1) 
26 else: 

27 if root.left is None: 

28 root.left - Node(key, root, info) 

29 else: 


30 root.left = kd tree add(root.left, key, info, depth + 1) 
32 return root 


34| 4HHHHHHHHHHEHHE Scan a KD tree and print key and info for each node 1HHHHHHHHHHHHE 
35] def kd tree scan (root): 


37 if root: 

38 lf root.left: 

39 kd tree scan (root.left) 

40 print(Keys ',-root.key, ^ Infos ', root. infoj 


41 if root- right: 
42 kd tree scan (root .right) 


12.5 KD-Tree Delete 


Code 12.5 kd tree delete.py Code for removing a node in a KD-tree 


1| HHHHHHHHHE Find the node with min or max value in a given dimension iHHHHHHHHHHE 


2| def kd tree find opt (root, # Root of a KD (sub-)tree 
3 dim, 4 Dimension in which optimum is looked for 
4 depth, # Depth of the root 
5 minimum, # Look for minimum (True) or maximum (False) 
6 value, # Best value already known 
7 Opt): 4 Node with optimum value 
8 depth opt - -1 

9 if (minimum and (value > root.key[dim])) \ 


10 or ((not minimum) and (value < root.key[dim])): 

11 opt = root 

12 value = root.key [dim] 

13 depth_opt = depth 

14 if root.left: 

15 opt, value, depth opt = kd tree find opt (root.left, dim, depth + 1, 
16 minimum, value, opt) 

17 if root.right: 

18 opt, value, depth opt = kd tree find opt (root.right, dim, depth + 1, 
19 minimum, value, opt) 

20 return opt, value, depth opt 


22 | HHHHHHHHHHHHHHHHHHEE Delete the root of a KD (sub-) tree ##H#H#HHHHHHHHHHHHHHHHHHH 


23| def kd tree delete (root, 4 Node to delete 
24 depth): # Depth of the node 
25 from kd tree add scan import K #Listing 12.4 
26 if root.left: # Root is an internal node, must be replaced 


27 replacing, val repl, depth repl = kd tree find opt (root.left, 
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depth $ K, depth + 1, False, float('-inf'), None) 
elif root.right: 
replacing, val_repl, depth repl = kd tree find opt (root.right, 
depth $ K, depth + 1, True, float('inf'), None) 
else: # The node is a leaf 
if root.father: 
if root.father.left -- root: 
root.father.left - None 
else: 
root.father.right - None 
return None # A leaf is directly deleted 


root.key - replacing.key[:] 

root.info - replacing.info 

kd tree delete (replacing, depth repl) 
return root 


12.6 KD-Tree Update Pareto Set 


Code 12.6 kd tree update pareto.py Code for updating a Pareto set represented by a KD-tree 


from kd tree add scan import K, kd tree add #Listing 12.4 


1HHHHHHHHHE Tell if node is in the box bounded by minimum and maximum ########## 


def 


HHH 
def 


kd tree in (node, 
minimum, maximum): # Lower and upper corner of the box 
i, result - 0, True 
while i « K and result: 
result = minimum[i] <= node.key[i] <= maximum[i] 
i-e eae 
return result 


Find a node (if any) with its depth in the box bounded by mini and maxi ### 
kd tree find (root, # Root of the tree in which the node is looked for 
mini, maxi, # Lower and upper corner of the box 
depth): # Depth of the root 


if root is None: 
return None, -1 

if kd tree in(root, mini, maxi): 
return root, depth 


if maxi[depth$K] >= root.key [depth$K]: 
result, depth found = kd tree find(root.right, mini, maxi, depth + 1) 
if result: 
return result, depth found 
if mini[depth$K] <= root.key [depth$K]: 
result, depth found = kd tree find(root.left, mini, maxi, depth + 1) 
if result: 
return result, depth found 
return None, -1 


iHHHHHHHHEHE Remove points of Pareto front dominated by costs (if any) ####H#H#HHH# 


def update 3opt pareto (pareto, 


4 Current Pareto front to update 


costs, 4 New point to be eventually added 
successors, distances): # Problem solution and data 
from kd tree delete import kd tree delete #Listing 12.5 


from tsp 3opt pareto import tsp 3opt pareto #Listing 5.5 
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37 minimum = [0 for _ in range (K)] 

38 maximum = [float('inf') for _ in range (K)] 

39 dominant, depth = kd tree find(pareto, minimum, costs, 0) 

40 if dominant is None: # No point of pareto dominates costs 
41 while True: # There are dominated points, costs improves pareto 
42 dominated, depth = kd_tree find(pareto, costs, maximum, 0) 

43 if dominated is None: # All dominated points removed 
14 break 

45 if dominated == pareto: 

46 pareto - kd tree delete (dominated, depth) 

47 else: 

18 dominated = kd tree delete (dominated, depth) 

19 

50 pareto - kd tree add(pareto, costs, successors, 0) 

51 pareto = tsp 3opt pareto (pareto, costs, successors, distances) 

52 return pareto 


12.7 TSP 2-Opt and 3-Opt Test Program 


Code 12.7 test tsp 2 and 3opt.py This code first generates a symmetric matrix with random 
distances and starts with a random solution. The latter is improved with a local search applying 
the first-move improving policy with the 2-opt neighborhood. This method is relatively rapid for 
instances with up to a few thousand cities. Then, all sub-paths of 100 successive cities in this 
solution are improved with a 3-opt neighborhood. This method runs in almost linear time, but 
only produces good solutions if the starting solution is adequate. The solution is then improved 
with a full 3-opt neighborhood. Its complexity is considerably higher; the computational time 
becomes significant beyond a few hundred cities. Ultimately, the solution is improved with the 
2-opt neighborhood, but applying the best move policy at each iteration 


2| Programme to test various local improvement methods 
Example of execution: 

4| Number of cities: 

s| 500 

6| Random solution: 24565 
Cost of solution found with 2-opt first 953 

8| Solution improved with 3-opt limited (100 cities) 738 

9| Solution improved with complete 3-opt 679 

10| Solution improved with 2-opt best 673 


ulii 


12| import math 


14| from random generators import x» # Listing 12.1 
15| from tsp utilities import +* # Listing 12.2 
16| from tsp_2opt_first import tsp 2opt first 4 Listing 5.4 
17] from tsp 20pt best import tsp 2opt best # Listing 5.1 
is] from tsp 3opt limited import tsp 3opt limited # Listing 6.1 
19| from tsp 3opt first import tsp 3opt first 4 Listing 5.2 


21| print('Number of cities: ') 
22|n = int(input()) 


24] distances = rand sym matrix (n, 1, 99) 
25| solution = rand permutation (n) 


27) length = tsp length (distances, solution) 
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print (‘Random solution: {:d}’.format (length) ) 


Solution, length = tsp 2opt_first (distances, solution, length) 
print (‘Cost of solution found with 2-opt first: {:d}’.format (length) ) 


successors = tsp tour to succ (solution) 
successors, length = tsp 3opt limited (distances, 100, successors, length) 
print('Solution improved with 3-opt limited (100 cities): {:d}’ 

. format (length)) 


successors, length = tsp _30pt_first (distances, successors, length) 
print (‘Solution improved with complete 3-opt: {:d}’.format (length) ) 


solution = tsp_succ_to_tour (successors) 
solution, length = tsp 2opt_best (distances, solution, length) 


print (‘Solution improved with 2-opt best: {:d}’.format (length) ) 


12.8 Multi-objective TSP Test Program 


Code 12.8 test tsp 3opt pareto.py A program for testing a local Pareto search for a TSP with a 
randomly generated distance matrix. For a 20-city and 3-objective instance, this program generates 
an approximation to the Pareto set with more than 3000 solutions. Since the implementation is 
highly recursive, the recursion stack and console user limits must be appropriately resized 


Programme to test pareto local seach for the TSP with 3-opt moves 
Example of run with K - 3 (KD-tree Key - costs; Info - tour) 


Number of cities: 


6 

Key: Tam 2S8 272 Info: E One cq 

Key: 193, 287) 236 Info: Ey wen aan ed xe: 

Key: 222, 288, 235 Info: 22-03 ed, v0 Sy a 

Key: 263, 249, 242 Info: JEU br eoi E 

Key: 199, 265, 244 Info: A rbv dedo 

Key: 182, 224, 320] ‘Info: Bos oredr medi 

Key: 184, 244, 297] Info: Enn dee Op E 

Key: 166, 340, 264 Info: do. 325300. 40 45. a 

Key: 246, 367, 201] Info: Seu 0r rds S 

n 

import sys 

from random generators import rand sym matrix # Listing 12.1 
from kd tree add scan import K, kd tree scan # Listing 12.4 
from tsp 3opt pareto import tsp 3opt pareto 4 Listing 5.5 
Sys.setrecursionlimit (50000) # Higly recursive implementation; enlarge stack! 
print('Number of cities: ') 

n - int(input()) 

distance = [rand sym matrix (n,1,99) for in range (K)] 

successors = [(i + 1) $ n for i in range (n)] # Initial solution 


costs = [0 for _ in range (K)] 
for dim in range (K): 
for i in range (n): 
costs [dim] += distance [dim] [i] [successors [i] 


12.10 Taboo Search TSP Test Program 255 


pareto = tsp 3opt pareto (None, costs, successors, distance) 


kd tree scan(pareto) #Print pareto front with tours (successors representation) 


12.9 Fast Ant TSP Test Program 


Code 12.9 test tsp FANT.py Program to test a method inspired by artificial ant colonies 


Programme to test the Fast Ant procedure 
Example of execution 


Number of cities: 

200 

Number of FANT iterations: 
200 

FANT parameter: 

30 

FANT 1 314 

FANT 2 310 

FANT 75 308 

FANT 175 306 

Cost of solution found with FANT 306 


Zu 


from random generators import rand sym matrix 4 Listing 12.1 
from tsp FANT import tsp FANT # Listing 8.3 


print (‘Number of cities: ') 
int (input () ) 
print (’Number of FANT iterations: ') 
fant iterations = int (input ()) 
print('FANT parameter (best solution reinforcement): ') 
fant parameter - int (input ()) 


distances rand sym matrix (n, 1, 99) 
tour, cost tsp FANT(distances, fant parameter, fant iterations) 
print('Cost of solution found with FANT: {:d}’.format (cost)) 


12.10 Taboo Search TSP Test Program 


Code 12.10 test tsp TS.py A taboo search test program for a TSP with a randomly generated 
symmetric distance matrix 


Programme to test a Taboo Search for the TSP 
Example of run: 

Number of cities: 

30 

Number of tabu iterations: 

200 

Minimum tabu duration: 

4 
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Maximum tabu_duration: 
20 

Penalty: 

0.005 

TS 2 1799 

TS 2: 1053 

TS 3: 906 

TS 4 796 

TS: 29 177 

TS 46 174 

ES X19 193 

Cost of solution found : 173 

TUS «0) 565557 271289 di 41559799. 70: 79 936. |otodue. coq 50r cec: qu] 


E35 


import math 


from random generators import x # Listing 12.1 
from tsp utilities import tsp length 4 Listing 12.2 
from tsp TS import tsp TS # Listing 9.1 


print (' Number of cities: ') 
n = int(input()) 


distances = rand sym matrix (n, 1, 99) 
Solution - rand permutation (n) 
length = tsp length (distances, solution) 


print('Number of tabu iterations: ') 

iterations = int (input ()) 

print (‘Minimum tabu duration: ') 

min tabu - int(input()) 

print('Maximum tabu duration: ') 

max tabu - int(input()) 

print('Penalty: ') 

freq penalty = float (input ()) 

Solution, length - tsp TS(distances, solution, length, 
iterations, min tabu, max tabu, freq penalty) 

print('Cost of solution found : {:d}’.format (length) ) 

print (solution) 


12.110 Memetic TSP Test Program 


Code 12.11 test tsp GA.py A memetic algorithm test program for a TSP with a randomly 
generated symmetric distance matrix 


Programme to test a basic memetic algorithm 
Example of execution: 
Number of cities: 


200 

Size of the population: 
10 

Mutation rate: 

0.02 


Number of generations: 


12.12 GRASP with Path Relinking TSP Test Program 


30 

GA initial best individual 9424 

GA improved tour 0 313 

GA improved tour 7 312 

GA improved tour 15 307 

Cost of solution found with GA: 307 


from random_generators import rand_sym_matrix 
from tsp GA import tsp GA 


print (‘Number of cities: ') 
n= int (input '().) 
print (’Size of the population: ') 


population size = int (input ()) 

print (’Mutation rate: ') 

mutation rate = float (input ()) 

print (’Number of generations: ') 

nr generations = int (input ()) 
distances = rand_sym_matrix(n, 1, 99) 
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# Listing 12.1 
# Listing 10.5 


_, cost = tsp_GA(distances, population_size, nr_generations, mutation_rate) 
print (’Cost of solution found with GA: {:d}’.format (cost) ) 


12.12 GRASP with Path Relinking TSP Test Program 


Code 12.12 test tsp GRASP PR.py A GRASP with path relinking test program. This method 
uses GRASP which calls for a local search based on ejection chains, as well as other utility 
functions 


Programme to test a GRASP with Path Relinking 
Example of execution: 

Number of cities: 

200 

Iterations: 

50 

Population size: 

10 

Alpha: 

07 

GRASP PR population updated: 11 319 
GRASP PR population updated: 12 317 
GRASP PR population updated: 13 318 
GRASP PR population updated: 15 318 
GRASP PR population updated: 16 320 
GRASP PR population updated: 18 317 
GRASP PR population updated: 19 315 
GRASP PR population updated: 20 316 
GRASP PR population updated: 21 309 
GRASP PR population updated: 25 316 
GRASP PR population updated: 26 313 
GRASP PR population updated: 35 316 
GRASP PR population updated: 36 313 
GRASP PR population updated: 40 313 
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GRASP PR population updated: 42 311 

GRASP_PR population updated: 49 313 

Cost of solution found with GRASP PR: 309 

pucr 

from random generators import rand sym matrix 4 Listing 12.1 
from tsp GRASP PR import tsp GRASP PR * Listing 10.7 


33) print('Number of cities: ') 


n = int (input ()) 
print('Iterations: ‘) 
iterations = int (input ()) 


|| print (‘Population size: ') 


population size = int(input()) 
print('Alpha: *) 
alpha = float (input()) 


distances = rand sym matrix (n, 1, 99) 


tour, length - tsp GRASP PR(distances, iterations, population size, alpha) 
print('Cost of solution found with GRASP PR: {:d}’.format (length)) 
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Solutions to the Exercises 


Problems of Chap. 1 


1.1 Draw 5 Segments 

A common mistake, when trying to solve this type of problem, is to take a sheet 
of paper and start drawing segments more or less at random. After unsuccessfully 
scribbling for a few minutes, we look for a more systematic proof. 

For instance, without loss of generality, we can assume that segment 1 cuts 
segments 2, 3, and 4. So, segment 1 does not cut segment 5. This implies that 
segment 5 necessarily cuts segments 2, 3, and 4. To complete the crossings, if we 
assume that segment 2 intersects segment 3, there is no longer any possibility for 
segment 4 to cut other segments. We can deduce, by enumeration of all the other 
possible hypotheses, that the problem has no solution. This way of proceeding is 
not reasonable if we want to show it is impossible to have 301 segments which each 
intersects 3 others: the combinatorics is such that we will never arrive at the end of 
the demonstration. 

A natural graph model—a crossing = a vertex; a segment = an edge—does not 
lead to something productive. In contrast, if a vertex corresponds to a segment and 
an edge represents the relationship that two segments intersect, then the solution 
of the problem becomes obvious. We are looking for a graph with five vertices of 
degree 3. So, we are looking for a graph whose sum of degrees is equal to 5-3 = 15. 
Since the sum of the degrees is even in any graph, we deduce the impossibility of 
drawing 5 (or 301) segments such that each one crosses three others. 
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a) 


Fig. 1 Principle of a polynomial transformation of an asymmetric traveling salesman problem into 
a symmetric one. (a) Two nodes i and j of the original directed graph. (b) Doubling of nodes and 
weights in an undirected graph; M is a sufficiently large constant value. (c) A possibility of visiting 
vertices i and j in the undirected graph, without having to pay the M penalty and by collecting a 
bonus of 2M, corresponding to a visit in the order i — j in the directed graph. (d) The only other 
reasonable possibility, corresponding to a visit in the order j > i 


1.2 O Simplification 


« OQ”) 
e oR”) 
© Q((n+2)!) 


e 2(nlog(log(n))) 
* O (n280) 
* On’) 


1.3 Turing Machine Program 
The transition function ô is given by the following table: 


Symbol on the tape (T`) 


Sue |b p pe — — [e —— — 


(qa, à, 1) (qo. c, 1) (qo. e, 1) (qo. n, 1) 
(qa. à, 1) (qo. c, 1) (qo. e, 1) (dan, n, 1) 
(qa. a, 1) (qo. c, 1) (qo. n, 1) 


1.4 Clique is NP-Complete 

Section 1.2.3.4 proofs that finding a stable set of a given size is NP-Complete. In the 
complementary graph, this problem is equivalent to looking for a clique of this size. 
Therefore, any stable set instance transforms polynomially into a clique instance. 


1.5 Asymmetric TSP to Symmetric TSP 
See Fig. 1. 
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Weight : z 114.197 
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Weight : zz 130.413 
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Fig.2 For the numerical application, a Steiner point at approximate coordinates (31, 19) is 
necessary (so that the three segments issuing from this point toward the vertices of the triangle 
make angles of 120°). The Steiner tree thus constructed has a weight of about 114.197, while that 
of the minimum spanning tree is about 130.413 


Problems of Chap. 2 


2.1 Connecting Points 

The problem here has not been formulated very precisely. This is a recurrent issue 
between two interlocutors who do not have the same background. For example, 
the manager of a company trying to explain the functionality of an application to 
a programmer. One forgets to mention certain constraints that seem obvious, and 
the other tries to transcribe the problem into a known algorithm but which is not 
satisfactorily modeling the reality. In this exercise, it has not been specified if the 
connections between the vertices had to be straight lines. If so, the solution consists 
in drawing a minimum spanning tree and the problem is simple. If the connection 
between two vertices is not necessarily a unique straight line segment, then the 
problem is to seek a Steiner tree and the problem is intractable. See Fig. 2. 


2.2 Accessibility by Lorries 

This is to maximize the minimum cut from A to B. The problem can be solved 
by calculating a maximum weight spanning tree in the network. The unique chain 
connecting A and B in this tree has the highest possible cut. For the numerical 
application, the chain is A — 5 — 1 — 7 — 4 — 3 — 8 — B and the cut has a value of 48. 
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Fig. 3 Connections to keep 


Fig. 4 Successive residue networks when applying the Ford and Fulkerson algorithm. An arc of 
very low capacity can be alternately used in the normal direction and then its flow canceled at the 
next iteration 


2.3 Network Reliability 

Since the weights correspond to probabilities that must be multiplied and not added, 
a standard algorithm of path length minimization can be used by taking the opposite 
of the logarithm of the probabilities. See Fig. 3. 


2.4 Ford & Fulkerson Algorithm Degeneracy 
The number of iterations can grow up to the ratio that exists between the arc with 
the largest capacity and the one that has the lowest. See Fig. 4. 
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2.5 TSP Permutation Model 
Problem instance data: n x n distance matrix D = (dj;). Objective: find a 
permutation p minimizing: 


n—l 


D dpi piss dp, pi 
i=l 


2.6 PAM and k-Means Implementation 

The algorithmic complexity of these two procedures can hardly be expressed 
theoretically, because the loops in lines 9 and 6 are repeated an indeterminate 
number of times. In practice, we observe a number of repetitions more or less 
proportional to k for Algorithm 2.7. This number is much lower for Algorithm 2.8, 
typically less than 20. More precisely, the number of repetitions for Algorithm 2.7 
increases very weakly with n when k is constant (we observe an increase from 
O (n9?) to O (n93), approximately), while this increase is approximately linear for 
k = n/20. 

Although very often used in practice, the k-means algorithm does not produce 
good solutions if given random initial center positions. Starting from the solution 
provided by Algorithm 2.7 gives much better results. However, the computational 
time to obtain this solution can be prohibitive. 


2.7 Optimality Criterion 
The scheduling is optimal because the last machine has to wait the shortest possible 
time before starting to work, and then it works continuously until the end. 


2.8 Flowshop Makespan Evaluation 
The earliest ending time for processing the ith object (p;) on machine j is given 
with the recurrence relation: 


ps 0 Ifi =Oorif 7 20 
E max( f;i—1j. fij-1) + tp; j Otherwise 


The latest starting time of this operation is given with the recurrence relation: 


- Finn Ifi=n+lorifj=m+1 
min(dj+1;, dij+1) — tp;; Otherwise 


2.9 Transforming the Knapsack Problem into the Generalized Assignment 
Let / be the set of n objects of revenue c; and volume v; and V the volume of the 
knapsack. We can create a generalized assignment problem with a set U consisting 
of m = 2 elements. The element u = 1 corresponds to the objects that must be put 
in the knapsack and the element u = 2 to those that remain outside. By solving a 
generalized assignment problem with c;, = 0, cji? = ci, Wi] = vi, Wi2 = 0, ti = V 
and t2 = 0, one minimizes the value of the objects remaining outside the knapsack 
while satisfying the volume constraint for those put in the knapsack. Therefore, 
knowing that the knapsack is NP-hard, this proves that the generalized assignment 
problem is also NP-hard. 
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Problems of Chap. 3 


3.1 Assigning Projects to Students 
The assignment problem can be solved by finding a maximum flow with minimum 
cost in a bipartite graph. See Fig. 5. 


* First step: a takes 1. 

e Second step: c takes 2. 

* Third step: a changes and takes 3; b takes 1. 

e Last step: a changes again and takes 4; d takes 3. 


3.2 Placing Production Units 
Consider the distance matrix D, flow matrix F, and assignment costs D x F with: 


|4 5 6 

Fa a 53 5779 
b|59 39 37 
c|so 53 83 


Dx 


Fig. 5 The successive flow increases for the optimal assignment of the four projects 
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If there are flows between the new units, the problem transforms into a quadratic 
assignment which is NP-hard. 


3.3 Oral Examination 

The problem can be modeled by coloring the edges of a bipartite graph. To ensure 
we can decompose the graph into a minimum number of perfect matchings, we must 
complete it so that there is the same number of modules than students. If there are 
fewer modules than students, create dummy modules. Next, we add dummy edges 
so that each vertex has the same degree. Thus, after finding a first matching, the 
corresponding edges can be removed. We can start again with a graph possessing 
the same properties. 


3.4 Written Examination 

The problem can be modeled by coloring the vertices of a graph. The vertices cor- 
respond to the modules to be examined. The edges correspond to incompatibilities 
between modules. All the vertices-modules a student must attend are completely 
connected by a clique. The problem is intractable. For the numerical example, 4- 
day timetables exist. For instance, (1, 7}, (2, 5}, (3, 6}, (4, 8}. 


3.5 QAP with More Positions Than Items 
If there are fewer elements (n) to place than positions (m), we can come back to the 
standard case with two m x m matrices by adding m — n dummy elements with a 
zero flow between them. 

If there is a fixed cost cj, for assigning element i to position r, the objective must 
be changed: 


n n n 


Xx 2 fijdpip; F * Cpi 
i=1 


i=l j=l 


3.6 Mobile Phone Keyboard Layout 

The problem can be modeled by a QAP. As we only have 27 symbols to place for 36 
positions, we extend the frequency matrix to have a 36 x 36 matrix. Let us consider 
the sub-matrices: 


2345 78910 
P 2345 and B — 789 10 
2345 789 10 
2345 789 10 


The “distance” matrix (corresponding to times) is given by: 
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3.7 Graph Bipartition to QAP 
Let A be the adjacency matrix of the graph (with 2n vertices). Let 0 and 1 two n x n 
matrices containing only Os and 1s. The QAP instance with flow matrix given by A 


and distance matrix given by | : 4 is equivalent to the graph bipartition problem. 


3.8 TSP to QAP 
A TSP instance with a distance matrix D can be transformed into a QAP one with 
the same distance matrix and the flow matrix given by: 


m lIIfj-—i-rlorifi—-nandj-l 
4 ~~ | 0 Otherwise 


3.9 Special Bipartition 
A binary vector can model a solution s and an example of a fitness function is: 


50 


50 2 
(um- Xia- s) + 36.000 — llós 
i=l 


i=1 


3.10 Magic Square 

We could model a solution attempt for a magic square of order n by a permutation 
of the elements from 1 to n*. A fitness function could be to sum the squares of the 
deviations from the target sum. But why on earth design a heuristic method for this 
simple problem? Indeed, polynomial algorithms exist for the construction of magic 
squares (except for n = 2). 


3.11 Glass Plate Manufacturing 

The no-wait flowshop sequencing problem can be modeled as an asymmetric TSP as 
follows. A fictitious plate is included with zero processing time on all the machines. 
The minimum difference between the starting time of the plate i and that of the plate 
k is given by: 


The (dik) matrix corresponds to the TSP distances. The optimum tour length 
corresponds to the minimum production time of the plates. The order of production 
is the same as that of the tour, taking care to start with the fictitious plate/city. 


3.12 Optimal 1-tree 

Choosing A, = 0,A2 = 0,43 = 6,44 = —4, and A5 = —2 provides a 1-tree of 
weight 74. This 1-tree corresponds to the circuit 1 — 2 — 4 — 3 — 5 — 1 which is 
therefore the optimal TSP tour for this instance. 
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Problems of Chap. 4 


4.1 Random Permutation 

The more the number of selected elements increases, the more unnecessary random 
draws must be made with Algorithm 4.5. At the last iteration, n trials are made on 
average while there is no alternative. Algorithm 4.6 can be implemented in © (n), 
but the permutations are not uniformly drawn. 

An efficient algorithm for generating random permutation, evenly distributed, 
is given in Code 12.1. The operating principle is as follows. All the items are 
introduced in an array p. At the ith step, the array contains random items until 
the index 7. Beyond this index are the items remaining to be chosen. 

We can check that a given item has the same probability of being in any place 
in the array p. Trivially, it has a probability of 1/n to appear in the first place. The 
probability of appearing in second place is calculated by considering that it should 
not be chosen for the first place (probability (n — 1)/n) and that it is chosen for the 
second (probability 1/(n — 1)), i.e. (n — D)/n-1/(n — 1) = 1/n; etc. 


4.2 Greedy Algorithms for the Knapsack 
The incremental cost of adding an item to the knapsack can be defined by: 


* The inverse of its revenue 
* [ts volume 
* [ts volume/revenue ratio 


4.3 Greedy Algorithm for the TSP on the Delaunay 

If the points are collinear, the Delaunay is a chain. In this case, we cannot build a 
tour. Figure 6 shows a nondegenerate Delaunay Triangulation with no Hamiltonian 
tour. 


Fig. 6 A non-Hamiltonian 
Delaunay triangulation 
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Fig. 7 Applying the beam search procedure to a TSP instance. At each level, both best partial 
solutions are retained and the tree is developed up to three levels 


4.4 TSP with Edge Subset 

The construction of a tour can effectively be completed in linear time if each city 
is only connected to its 40 nearest ones. However, these cities cannot be obtained 
in linear time. In addition, all the cities the closest to the last visited may already 
belong to the tour under construction. 


4.5 Constructive Methods Complexity 
The nearest neighbor heuristic for the TSP can be implemented in O (n?) (See 
Code 4.3). For the beam search, there are O(n) elements to examine for each of 
the p retained at a given level. Since the partial tree is examined up to level k, the 
complexity is in O (nkp) for each element to be added. Since there are n elements 
to add, the global complexity is in O (kpn?). 

For the pilot method, there is O(n?) work to do before including an element. 
The global complexity is therefore in O (n^), which is confirmed by numerical 
experiments (see Fig. 11.4). 


4.6 Beam Search and Pilot Method Applications 
For this problem instance, both the beam search and the pilot methods produce the 
solution 1 — 2 > 5 — 3 — 4 — 1 of length 43. Figure 7 provides the partial 
solutions successively built for the beam search. 

Figure 8 provides the solutions built with the pilot method. 


4.7 Greedy Algorithm Implementation for Scheduling 
The simplest way is to calculate the increase of the production time if the next object 
is included at the end (or the beginning) of a sequence. The object with the lowest 
increase is selected. 

A more complex method is to try to insert the next object at all possible positions 
in the partial sequence. This heuristic, called NEH, is thoroughly studied in the 
literature. 
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Fig. 8 Applying the pilot method to a TSP instance. The pilot heuristic is the nearest neighbor. 
The partial solutions completed by the pilot heuristic are drawn with dotted lines 


4.8 Greedy Methods for the VRP 
A greedy method is given by Algorithm 1. 


1 Create n tours warehouse — i — warehouse (warehouse = city 0) 

2 forall i,j = 1,...,n do Compute savings 

3 Sij = Ci + Coj — cij savings that can be achieved by merging tours i and j 
4 Sort the s;; by decreasing value 

5 forall i, j, in the order of the s;; do 


6 if i is at the end of a tour and j at the beginning of another one and the sum of the 
demands of both tours < vehicle capacity then Merge the tours 
7 | Add the arc i — j and remove the arcs i— 0 and 0 — j 


Algorithm 1 Clarke and Wright's savings greedy algorithm is often cited for building a solution 
for the vehicle routing problem or for the traveling salesman problem 


One can also choose any greedy heuristic for the TSP by initiating it with the 
depot. As long as there is space left in the vehicle, customers are added to the tour 
under construction. Then start again with a new vehicle. 


Problems of Chap. 5 


5.1 Local Minima 
See Fig. 9. 

The neighborhood has the property of connectivity. x can be increased by 1 unit 
with the compound move 4 — 3. x can be decreased by 1 unit with the compound 
move 4—3—3+4+4-3. 
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Fig. 9 Local minima of a function of a discrete variable x relative to a neighborhood consisting 
in either adding 4 or subtracting 3 from x 


5.2 Minimizing an Explicit Function 
With the first improvement move policy, the following sequences of values are 
obtained: 


650 — 585 — 495 — 428 > 372 — 336 — 304 — 256 — 234 > 230 > 
222 > 210 > 157 > 126 > 116 > 97 > 85 —> 70 > 58 > 34> 17 > 
14>5- —4 


510 —> 472 > 457 —^ 435 —^ 377 5 368 — 278 — 212 156 > 118 > 
89 > 83 > 74 — 65 > 57 > 38 > 33 > 25> 14 — 5 — —4. 


With the best improvement move policy, the following sequences of values are 
obtained: 


248 > 193 > 138 > 123 — 89 — 58 > 34> 17> 145 5— —10 
92 > 58 > 35 > 32— 17 > —4 


5.3 2-opt and 3-opt Neighborhood Properties 
The 2-opt move (i, j) changes the tour 


Pb— sj—sjei 
to tour: 
D— js sjoi 
If j is the direct successor of s;, we can therefore swap two adjacent cities with 


a 2-opt move. We deduce that the 2-opt neighborhood possesses the connectivity 
property since we can sort any array with a sequence of adjacent swaps. 
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The 3-opt move (i, j, k) changes the tour 
P— sw j—sjek soi 
to tour: 
i > sj k> soj—spoi 


By successively applying the 2-opt moves (i, j), (k, i), (k, si), the 3-opt move 
(i, j, k) is achieved: 


i> jasi > sjok— spi 

k> i~ sk> jsi > sjok 

k> si~ jf —spoi—sjek 
With a 2-opt move, one can place any city after any other. Therefore, one can 
transform any permutation into any other in n — 1 steps at most. A 3-opt move 


allows each city to be moved individually to any place. Thus, the 2-opt and 3-opt 
neighborhoods have a diameter smaller than n. 


5.4 3-opt for Symmetric TSP 
There are four possibilities: 


i > sjek— sj spi 
i > sja k> jsi spi 
i> jes > k~s si 
i > kw sj > si~ j> ski 


Only the first possibility respects the direction of travel of the three sub-paths. 
5.5 4- and 5-opt 
For 4-opt, there is only one possibility respecting the travel direction. The 4-opt 
move (i, j, k, u) changes the tour 
P— si = j > sj ~ k —> sou Sui 


to tour: 


i > sk œ Uu > sja k> si~ j > Suwi 
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This move is also called double-bridge. For 5-opt, there are eight different possibil- 
ities respecting the direction of travel of the five sub-paths. 


5.6 Comparing 2-opt Best and First 

To verify that a solution is 2-optimal, we must test n(n — 1)/2 — 2 moves. The 
number of repetitions of the while loop in Code 5.4 grows very slowly with the 
size of the problem (empirically between n°! and n™!7) because this procedure 
removes almost all crossings on the first pass. 

For Code 5.1, this number of repetitions is almost linear (proportional to n!:'), 
As only one move is performed at each iteration, it can be predicted that, on average, 
each node is involved a constant number of times in a move during the optimization 
process. 

This increase is virtually independent of the starting solution, but the absolute 
number of repetitions is about 11 times higher when starting from a random solution 
than when starting from a solution constructed with a greedy algorithm. 


5.7 3-opt Candidate List 

A 3-opt move is defined by a triplet (i, j, k). If j and k are limited to 40 values, 
we can evaluate this limited neighborhood in O (n). Indeed, for each i, there are at 
most 40 - 39 neighbors to evaluate. To be able to evaluate this neighborhood, it is 
necessary to ensure that the city j is indeed on the path i ~> k and not on the path 
k ^ i. Itis thus necessary to have a data structure which can supply this information 
in constant time. This can be the respective position of each city in the tour. 

This limited neighborhood is no longer connected. 


5.8 VRP Neighborhoods 
It is not elementary to construct a feasible solution with a specified number m of 
tours due to capacity limitations. It is indeed a bin packing problem which is NP- 
hard (generalization of the set bipartition problem). We can introduce a dummy tour, 
of unlimited capacity, corresponding to the customers not served by the m ordinary 
tours. A penalty is associated with each customer on this tour (e.g., the distance 
depot—customer— depot). 

Here are some neighborhoods with the connectivity property for the relaxed 
problem: 


* Take client i and optimally insert it into tour k (dummy or not). Neighborhood 
size: O (nm). 

* Swap customers i and j belonging to different tours. Size of the neighborhood: 
O(n’). 

* Exchange the beginning of a tour (up to customer i) and the beginning of another 
one (up to customer j). Neighborhood size: O(n’). 

* Exchange a portion of one tour (between customers i and j) with a portion of 
another (between customers r and s). Neighborhood size O (n^) (or O (m?) if we 
assume that the number of customers per tour is bounded by a constant). 
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5.9 Steiner Tree Neighborhood 
Model with Steiner vertices to be retained: 


e Introduce a new Steiner vertex or delete a Steiner vertex. 
* Introduce a new vertex and delete one simultaneously. 


This second neighborhood is not connected. Computing the value of a neighboring 
solution: apply a minimum spanning tree algorithm. The complexity is O (m + 
nlogn), where m is the number of edges and n is the number of vertices. 

Model with connected graph containing incident edges to all terminal vertices: 
Introducing a new edge (and deleting one if a cycle is created) or deleting one 
edge (and introducing another if the terminal vertices are no longer in the same 
connected component). This also implies deleting other edges if there is a connected 
component solely composed of Steiner vertices. The computation of the solution 
value is performed using a graph exploration algorithm which is in O (m + n). 


5.10 Ejection Chain for the VRP 

The chain is initiated by the ejection of a customer i from a tour. The reference 
structure is a set of tours + an isolated customer. To try a new solution, one 
attempts to insert i into a tour with sufficient capacity to accept it. The chain can 
be propagated by inserting i into another tour k while simultaneously ejecting a 
customer j not yet ejected from that tour. The modified tour k must be feasible and 
satisfy the capacity constraints. For the propagation, j replaces i. The ejection chain 
ends if: 


* There is no more possible ejection (all the vertices (or a maximum number) were 
ejected). 

e No tour has sufficient capacity to accept i even after deleting j. 

* A tried solution is retained. 


The complexity of an ejection chain can be established as follows (assuming 
one goes directly from the preceding customer to the one following the one being 
ejected, and inserting a customer at the best possible place in the tour). During 
chain propagation, we try inserting i into the remaining m — | tours, testing for each 
insertion all candidates for ejection. This can be done in O(n). The maximum length 
of the chain is also in O (n). Therefore, a chain can be evaluated in O(n’). Since 
there are n different ways to initiate a chain, the overall complexity is in O (n°). 


Problems of Chap. 6 


6.1 Dichotomic Search Complexity 

The dichotomic search in a sorted array proceeds by dividing the array into b = 2 
parts. Only one part (a = 1) has to be processed recursively. For this problem, there 
is no reconstruction and the computational effort to be made between two recursive 
calls is constant ( f (n) = O(1)). 


274 Solutions to the Exercises 


Machine Earliest finish time Sub-problem Latest start time 


Machine 


Time 


Optimized sub-problem 


Fig. 10 POPMUSIC for the flowshop scheduling problem: the sequences of objects preceding and 
succeeding those defining the sub-problem are not changed 


Referring to the second case, we find that T (n) = @(n!°82! . log n) = O(log n). 


6.2 POPMUSIC for the Flowshop Sequencing Problem 

In the context of the permutation flowshop scheduling, an object can represent a 
part for POPMUSIC. A sub-problem consists of the r contiguous objects in the 
sequence. A sub-problem is optimized with constraints on the earliest start and latest 
finish times (see Fig. 10). 


6.3 Algorithmic Complexity of POPMUSIC 

The most complex part of implementing a POPMUSIC method is obtaining an 
appropriate initial solution. The structure of the initial solution is critical for the 
method to provide good solutions. It is important to be able to produce this 
solution with an algorithmic complexity as low as possible. If these conditions are 
fulfilled, the most significant contribution to the complexity of the framework is 
the identification of the r parts that make up a sub-problem. If the computational 
effort to identify a sub-problem depends on the size of the problem, the empirical 
complexity of the framework is no longer linear. 


6.4 Minimizing POPMUSIC Complexity for the TSP 

The complexity of building a sample tour is O (n^). The complexity of building the 
initial tour containing all the cities is O (n^*!). The complexity of the optimization 
with POPMUSIC is O(n!*^4-4^), For a < 1 + V2, the minimum complexity is 


reached for h* = 77; for this value, the global complexity is O (net ). This is the 
typical situation for a first improvement local search with 2-opt neighborhood. 


Fora 2 14 J2, the minimum complexity is reached for h* = lHa, for this 


value, the global complexity is O(n"). This is the typical situation for a local 
search based on Lin-Kernighan neighborhood. 

Figure 11 illustrates the complexity of each step of the method as a function of 
h. 
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Fig. 11 Diagram used for determining the lowest possible algorithmic complexity as a function 
of h. Left, when a = 2; right, when a = 3 


Problems of Chap. 7 


7.1 SA Duration 


A simulated annealing starting with an initial temperature Tọ and ending when 
log T¢—log To 


the temperature reaches Ty performs iga 


multiplied by « at each iteration. 


7.2 Tuning GRASP 

For this problem instance, the parameter o has almost no influence! Since the 
starting city is randomly selected in the greedy construction, the latter produces 
varied initial solutions, even if œ = 0. The local search Code 12.3 used in this 
function produces relatively good quality solutions, even if the starting solution is 
extremely bad. With a less efficient local search, for example, Code 5.4, it is better 
to choose a close to 1. 


7.3 VNS with a Single Neighborhood 

When a single neighborhood M is available, a convenient way to implement a 
variable neighborhood search is to consider that a random move in Mj corresponds 
to k random moves in Mj. 


7.4 Record to Record 

The variable neighborhood search implementation can be improved by performing 
two random swaps at each iteration rather than an increasing number of moves if 
the solution has not been improved. Code 1 implements such a method. 


iterations if the temperature is 


276 Solutions to the Exercises 


Code 1 tsp record to record.py Implementation of a record-to-record method. The solution is 
perturbed by performing two random swaps in the best solution achieved. The method for repairing 
a perturbed solution is an ejection chain. The method proposed by Dueck [1] includes an additional 
parameter: a tolerance value of a possible degradation of the solution obtained after the local search. 
The code provided here would therefore correspond to zero tolerance 


from random_generators import unif # Listing 12.1 
2} from tsp utilities import tsp length # Listing 12.2 
3| from tsp LK import tsp LK # Listing 12.3 


5| HHHHHHHHH Record to record iterative local search for the TSP 
j| def tsp record to record (d, # Distance matrix 
iterations, # Number of iterations 
best tour, # TSP tour 
best_length) : 
= len(d[0]) 
for iteration in range(0, iterations) : 
tour = best_tour[:] # No tolerance: always revert to best tour 


in range (2): # Perturbate solution 
= unif (0, ) 
= unif(0, n - 1) 
tour[u], tour[v] = tour[v], tour[u] 
length = tsp length(d, tour) 
tour, length - tsp LK(d, tour, length) 
iteration += 1 
if length « best length: 
best tour - tour[:] 
best length - length 
print (‘Record to record {:d}\t {:d}’ 
.format (iteration, length) ) 
return best_tour, best_length 


Problems of Chap. 8 


8.1 Artificial Ants for Steiner Tree 
The trails can be stored in an array indexed by the elements of a solution. If we 
choose a model where an element is a Steiner node, the a priori interest could be the 
cost of the minimum weight spanning tree over the terminal nodes plus the Steiner 
nodes selected by the ant. However, this modeling poses a problem: how to decide 
when the ant should stop incorporating Steiner nodes before returning its solution? 
If we select a model where an element e of a solution represents an edge of the 
tree, the a priori interest is simply the weight of the edge e, if the latter can be added 
without creating a cycle. The a posteriori interest is proportional to Te. An ant builds 
a solution edge by edge, taking care not to produce a cycle. It can stop as soon as 
all the terminal nodes are present in the tree. This second model seems to be better 
adapted to an ant algorithm. 


8.2 Tuning the FANT Parameter 

For small problem instances, it is challenging to adjust the parameter of the 
FANT method. Indeed, the local search produces solutions whose quality is almost 
independent from that of the initial solutions. For numbers of iterations above 100, 


Solutions to the Exercises 277 


a parameter Tp at least equal to 200 seems to produce solutions of moderately better 
quality. 


8.3 Vocabulary Building for Graph Coloring 

The solution fragments to be stored in the dictionary can consist of maximum stable 
sets of the graph. Indeed, all vertices of a stable set can be colored with the same 
color. A solution can be obtained by selecting a minimum number of stable sets 
covering all the vertices of the graph. If such a subset of stables can be found, it can 
be matched with a feasible coloring: a vertex occurring in several of the selected 
stables will receive an arbitrary color corresponding to one of the stables of which 
it is a part. In practice, to attempt to obtain a coloring with a fixed number of colors, 
one constructs a tentative solution with slightly fewer stable sets than this number. 
The uncovered vertices define a subgraph which is colored independently. If this 
subgraph is not too large, an exact method can be used. The vertices receiving 
the same color in this subgraph define a stable set not necessarily maximal in the 
complete graph. This set can be completed in a maximal stable set and join the 
solution fragments in the dictionary. 


Problems of Chap. 9 


9.1 Taboo Search for an Explicit Function 
Starting from the solution (—7, —6), the sequence of visited values with a taboo 
duration d = 3 is: 92 > 58 —> 35 — 32 > 17 > —4 > -3 > -10 — 0> 
4 — 3 — 29 > 38 — 57 — 74 — 98 — 129 > 109 > 105 — 97 — 70 > 
46> 25> —4— 0 — —10. 

With a taboo duration d — 1, starting from the solution (—7, 7), we have 248 — 
193 —> 138 —> 123 — 89 > 58 —> 34> 17> 14>5-> -10> -3— 
456—715 14->6>-4->-3-> -10-0--4-3>8-> 
0 


9.2 Taboo Search for the VRP 
Here are some possibilities for defining taboo conditions for the VRP: 


e Forbid for dı iterations to delete an arc that has just been added and for dz 
iterations to add an arc that has just been deleted. 

e Forbid during d iterations to move a customer in a tour that it has left. 

e Forbid during d iterations to modify a tour again. 


9.3 Taboo Search for the QAP 
The solutions successively visited are: 


(1,2,3,4,5) — (2,1,3,4,5) — (3,1,2,4,5) — 


(3 
(2,3,1,4,5) — (4,3,1,2,5) ^ (43,5,2,1)) — (4, 
(2,4,5,3,1) > (2,4,5, 1,3) > (3,4,5, 1, 2). 
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Table 1 Move evaluation for the application of a taboo search to a small quadratic assignment 


instance 
ww [| p D [| B 1 p e p fo 
Cost 66 [ss [ss [so [so |62  [s2 [s2 [so [so 


Move Value, Selected, Taboo 


a2) [|-s [s |s [o jo js js Je J3 Jz 


d,3) 


TET a p þh | jm [m |» m 
* [5 


(1,5) 0 

(2,3) 10 
(2,4) 12 
(2,5) 24 
(3,4) 20 


EET 20 


(4,5) 


Table 1 provides, for the first ten iterations: 


* The cost of the solution before performing a move 
* The cost differential of each move, with the value of the chosen move in bold 
and in italics if it 1s forbidden 


Problems of Chap. 10 


10.1 Genetic Algorithm for a One-Dimensional Function 

A standard binary representation of a solution is not appropriate. Indeed, two 
solutions with very close values can have very different representations. For 
example, (10000000) is completely different from (01111111), even if its value 
differs by only one unit. For this type of problem, it is better to choose a Gray 
coding, where the code of x is given by x © 5 (® denoting the bitwise exclusive 
OR). 


10.2 Inversion Sequence 

The inversion sequence (4,2,3,0,1,0) corresponds to the permutation 
(4, 6, 2, 5, 1, 3). (0,0,3, 1, 2, 0) is not an inversion sequence. Element 5 cannot 
have two elements greater than itself in a permutation of 6 elements. A sequence 
si@i = l,...,n) is an inversion sequence if and only if 0 < s; < n — i. The 
standard crossover operators can be used directly with inversion sequences, as 
they preserve the property stated above. The drawback is that the corresponding 
offspring permutations cannot be constructed in linear time. 
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Table 2 Set of solutions generated in the first iteration of scatter search. The best three become 
the new elite set, identical ones are deleted. The best two that are the most different from an elite 
solution are retained 


Generated solution Repaired solution Distances 
(0011000000) (111000100 [36 |o h [353 


6111000010 |& |s 6111000010 [€ | 89 | Deleted | 
(0111100010) 
(1011100000) 4 3 5, Retained 
(1011100000) 
(0111100010) 
(0111100010) 
(1001001001) |38 |%  |d00100100D [38 | |e X |635 
(0111100010) 
(1011000001) Elite 
(0111000001) [9 | 75 6111000010 [€ | s9 | Deleted | 
6111000010 [4 |s 6111000010 [€ | 89 | Delete | 
6111000010 [4 |s 6111000010 [€ | 89 | Deleted | 
101000001) [30 | 59  |(10100010D [35 |92 (s | 553 
(1011100000) 
(0111100000) |41 | 86  |(0111100010) |44 |100  |Deleted | 
(1001000001) 53 3, Retained 
(0101000001) 
(0111100010) 
(0111100010) 
(1011000001) 
(011100001) [42 | 89 [111000011 |t | #9 [Deleted | 
(0011100000) 
(0111000001) |» | 75 6111000010 [4 | s9 | Delete | 

In the context of scatter search, k solutions s!,..., s* can be mixed by rounding 
the elements of AU (with f (-) to maximize). 


10.3 Rank-Based Selection 
The probability of xzank based selection (m) to return v is 


10.4 Tuning a Genetic Algorithm 

Two parameter settings seem to be appropriate: with a zero mutation rate, a 
relatively large population (100 solutions) should be adopted. With a single random 
mutation after crossover, a population of a dozen solutions is adequate. 


2:(m—v4-1) 
m-(m+1) ^ 


10.5 Scatter Search for the Knapsack Problem 
Table 2 provides the list of solutions produced at the first generation of a scatter 
search applied to a knapsack instance. 
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Table 3 Empirical number of elements found by the Pareto local search and number of elements 
compared when using a KD-tree or a linked list. n is the number of TSP cities. For K = 4 and 
n = 30, the number of comparisons is larger than 3 - 10!! with a KD-tree. With a linked list, it 
would have taken several weeks or months of calculation for the program to finish 
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Problems of Chap. 11 


11.1 Data Structure Options for Multi-objective Optimization 

The number of comparisons grows much faster with a linked list than with a KD- 
tree. This growth seems polynomial with the number of cities in the problem 
instance. The size of the Pareto set also grows polynomially. The degree of these 
polynomials grows very strongly with K, the number of objectives. Table 3 gives an 
estimate of these degrees for instances with random distance matrices, uncorrelated 
between objectives. Because of the recursive algorithm for deleting an element in a 
KD tree, about twice as many element removal queries must be made as with a list. 


11.2 Comparison of a true Simulated Annealing and a Kind of SA with 
Systematic Neighborhood Evaluation 

The noising method code that systematically evaluates the neighborhood allows 
many more iterations for the same computation time. At low temperatures, conver- 
gence is faster, as shown in Fig. 12. Both methods stop after just under 10 million 
iterations. 
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Fig. 12 Comparison of a basic simulated annealing and a simulated annealing with systematic 
neighborhood evaluation (kind of noising method). Median solution value as a function of the 
number of iterations 
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